Physical systems for the solution of hard computational problems Peter Mattsson Master of Science Cognitive Science and Natural Language Processing School of Informatics University of Edinburgh 2003 Abstract We start from Landauer’s realization that “information is physical”, i.e. that computation cannot be disentangled from the physical system used to perform it, and ask what the capabilities of physical systems really are. In particular, is it possible to design a physical system which is able to solve hard (i.e. NP-complete) problems more efficiently than conventional computers? Chaotic physical systems (such as the weather) are hard to predict or simulate, but we find that they are also hard to control. The requirement of control turns out to pin down the non-conventional options to either neural networks or quantum computers. Alternatively, we can give up the possibility of control in favour of a system which is basically chaotic, but is able to settle at a solution if it reaches one. However, systems of this type appear inevitably to perform a type of stochastic local search. Our conclusion is that quantum computers give us more computational “tools” (though using them requires insight into the problem to be solved), but that all other physical systems are essentially no more powerful than a conventional computer. i Acknowledgements First, I would like to thank my supervisor, David Barber, for help, encouragement and long chats in Starbucks. I would also like to thank Douglas Adams, for giving me an insouciant attitude to deadlines, as well as Wolfgang Lehrach, David Heavens and others (you know who you are!) for getting me through the final stretch without loss of sanity or humour. ii Declaration I declare that this thesis was composed by myself, that the work contained herein is my own except where explicitly stated otherwise in the text, and that this work has not been submitted for any other degree or professional qualification except as specified. (Peter Mattsson) iii To my parents. iv Table of Contents 1 Introduction 1 1.1 Complexity theory . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 The satisfiability problem . . . . . . . . . . . . . . . . . . . . . . 7 1.2.1 1.3 1.4 Search space structure . . . . . . . . . . . . . . . . . . . . 12 Solution methods for the satisfiability problem . . . . . . . . . . . 17 1.3.1 Local search methods . . . . . . . . . . . . . . . . . . . . . 17 1.3.2 Genetic/evolutionary methods . . . . . . . . . . . . . . . . 19 1.3.3 Message passing methods . . . . . . . . . . . . . . . . . . 19 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2 On the complexity of physical systems 2.1 22 The neon route-finder . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.1.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2 General properties of physical systems . . . . . . . . . . . . . . . 27 2.3 Analogue vs. digital . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.3.1 A diversion into chaos theory . . . . . . . . . . . . . . . . 32 The problem of local continuity . . . . . . . . . . . . . . . . . . . 33 2.4 v 2.5 2.4.1 The Zealot Model . . . . . . . . . . . . . . . . . . . . . . . 35 2.4.2 The Ising model . . . . . . . . . . . . . . . . . . . . . . . . 37 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3 Novel computing devices 44 3.1 Mechanical computers . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2 Electrical analog computers . . . . . . . . . . . . . . . . . . . . . 46 3.3 Artificial neural networks . . . . . . . . . . . . . . . . . . . . . . . 48 3.4 Circuit-based “consistency computers” . . . . . . . . . . . . . . . 51 3.4.1 The travelling salesman problem . . . . . . . . . . . . . . . 52 3.4.2 The satisfiability problem . . . . . . . . . . . . . . . . . . 54 3.4.3 Integer factorization . . . . . . . . . . . . . . . . . . . . . 55 3.4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.5 DNA computers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.6 Amorphous computers . . . . . . . . . . . . . . . . . . . . . . . . 58 3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4 Quantum computers 4.1 60 Quantum mechanics, a brief introduction . . . . . . . . . . . . . . 61 4.1.1 Wave-particle duality . . . . . . . . . . . . . . . . . . . . . 61 4.1.2 The Schrödinger wave equation . . . . . . . . . . . . . . . 63 4.1.3 Quantum states . . . . . . . . . . . . . . . . . . . . . . . . 64 4.1.4 Operators and Measurement . . . . . . . . . . . . . . . . . 67 4.1.5 Quantisation and spin . . . . . . . . . . . . . . . . . . . . 68 vi 4.1.6 Quantum parallelism, entanglement and interference . . . 69 4.1.7 Heisenberg’s uncertainty relations . . . . . . . . . . . . . . 71 4.1.8 The problem of measurement . . . . . . . . . . . . . . . . 71 4.1.9 Quantum computers . . . . . . . . . . . . . . . . . . . . . 73 4.1.10 Approaches to building quantum computers . . . . . . . . 76 4.1.11 Error correction . . . . . . . . . . . . . . . . . . . . . . . . 77 4.2 Quantum algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.3 Grover’s database search algorithm . . . . . . . . . . . . . . . . . 83 4.4 Shor’s factoring algorithm . . . . . . . . . . . . . . . . . . . . . . 87 4.5 What makes quantum computers powerful? . . . . . . . . . . . . 91 4.5.1 Classical versions of Grover’s algorithm . . . . . . . . . . . 91 4.5.2 Classical entanglement . . . . . . . . . . . . . . . . . . . . 92 4.5.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.6 5 Test case 96 5.1 Error correcting codes . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Complexity of the decoding problem . . . . . . . . . . . . . . . . 101 5.3 Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.4 96 5.3.1 The Barber scheme . . . . . . . . . . . . . . . . . . . . . . 107 5.3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Solving the decoding problem using physical models . . . . . . . . 110 5.4.1 Ising model formulation . . . . . . . . . . . . . . . . . . . 110 vii 5.4.2 5.5 Neural network formulation . . . . . . . . . . . . . . . . . 111 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6 Conclusions 6.1 113 Possibilities for further work . . . . . . . . . . . . . . . . . . . . . 114 A Miscellaneous quantum questions 116 A.1 Quantum “nonlocality” and the EPR experiment . . . . . . . . . 116 A.2 Are space and time discrete? . . . . . . . . . . . . . . . . . . . . . 118 Bibliography 120 And finally. . . 129 viii List of Figures 1.1 Likely structure of the NP problem class, taken from [58]. . . . . . 1.2 For random 3-SAT problems, the time taken to find a solution (here 4 measured by the number of algorithm calls) peaks at α ≈ 4.25, and the peak becomes increasingly marked as the number of variables rises. (Adapted from [47] by Bart Selman.) . . . . . . . . . . . . . 1.3 Probability that a random 3-SAT problem will be satisfiable, as a function of α. Note the sharp drop centered around α ≈ 4.25. (Adapted from [47] by Bart Selman.) . . . . . . . . . . . . . . . . 1.4 14 15 Phase diagram for 3-SAT. The variables plotted are: Σ, defined as log(no. of satisfiable states)/N ; e0 , the minimum number of violated clauses per variable; and eth , the typical energy (per variable) of the most numerous local minima. (An energy of zero implies that all clauses are satisfied.) Taken from [54]. . . . . . . . 2.1 16 The shortest route between Imperial College and Admiralty Gate, as found by the neon route-finder. (Reproduced from the Imperial College Reporter.) 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Example “route-finding” circuit . . . . . . . . . . . . . . . . . . . 24 ix 2.3 A circuit with multiple branches can be considered as a circuit with only one branch, but many leaves. To do this, we need to calculate the equivalent resistance of each path through the multiplybranching circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Ising “wire”. The solid lines represent +J-weight bonds and the dotted lines represent 0-weight bonds. . . . . . . . . . . . . . . . . 2.5 25 39 Ising “AND gate”. The solid lines represent +J-weight bonds and the dotted lines represent 0-weight bonds. The inputs are on the left, and the output is on the right. The middle input must be an impurity, used to bias the result. . . . . . . . . . . . . . . . . . . . 39 3.1 Basic model of a neuron. . . . . . . . . . . . . . . . . . . . . . . . 48 3.2 Co-operative pair of neurons. . . . . . . . . . . . . . . . . . . . . 50 3.3 Travelling salesman model. The lines represent wires, with the arrows representing the direction in which the potential difference across that wire encourages the current to flow. The cities – C1 to C4 – are small artificial neural networks which have the link wires as inputs and outputs. . . . . . . . . . . . . . . . . . . . . . . . . 53 3.4 Travelling salesman city node network. . . . . . . . . . . . . . . . 53 3.5 Basic structure of the satisfaction problem network, for a sevenvariable problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 54 Fredkin’s “billiard ball” model of computation. The presence or absence of a ball represents a value of 1 or 0, respectively, so this implements a two-input, four-output reversible logic gate. . . . . . x 74 4.2 Physical implementation of Deutsch’s algorithm. A half-silvered mirror divides an input laser beam in two, sending one part along Path 0 and the other along Path 1. The two beams interfere again at a second half-silvered mirror, and a pair of photo-sensitive detectors record the results. Taken from [66]. . . . . . . . . . . . . . 5.1 81 “Ising model” corresponding to the example generator matrix. This model has non-local bonds, but it is still planar, so it is still easy to find the ground state exactly. . . . . . . . . . . . . . . 5.2 99 Probability of successful decoding, P (N ) for a noise level of 5%. The success probability drops, apparently exponentially, as N increases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.3 F (pc , pe ) for pe = 0.1, for various matrix densities (n, m). For (2, 4), the naı̈ve and error estimates are correlated, but as soon as the density rises above this they fail to be. . . . . . . . . . . . . . 103 5.4 F (pc , pe ) for n = 2, m = 4 and values of pe between 0.1 and 0.4. This shows that noise levels of up to about 20% can be tolerated. 5.5 104 By arranging the spins into a grid, simple nearest-neighbour bonds give most of the spins four bonds each. A few non-local bonds round the edges can then ensure that all the spins have four bonds without sacrificing the planarity of the model. . . . . . . . . . . . 110 xi Chapter 1 Introduction Information is physical. Rolf Landauer Information surrounds us, and computation is constantly taking place; whether it is the wind manipulating leaves and branches, or electricity manipulating bits and switches, the principle is the same. Any physical process can be thought of as a computation; why, then, should we restrict attention to the limited class of semiconductor devices that are currently used in computing? This project was motivated by the thought that maybe physics could offer us a few more tools than we have yet taken advantage of; we wanted to find, in short, the source of computational power. Our very existence is proof that other computing paradigms are possible, and can be powerful; our brains are capable of much that we have, as yet, been unable to replicate using semiconductor-based computers. Is this a failure of ingenuity on our part, or does the physical makeup of a brain bestow it with fundamentally greater power? Much has been made of its massive parallelism, with 1011 neurons, each with tens of thousands of connections; is this the key? Alternatively, does its adaptive, stochastic operation give it a greater ability to find approximate answers, at the cost of precision? Does nature have any other tricks up her sleeve? Are some calculations simply impossible? These are big questions, and 1 Chapter 1. Introduction 2 we would not claim to have all the answers, but we do aim at least to clarify the options and rule out some of the possibilities. The remainder of this chapter will be devoted to a discussion of what we mean by the term “hard computational problem”, and of some of their typical properties. The following chapter will then develop very general arguments which put upper bounds on the computational ability of any imaginable physical system. Following that, we discuss several alternative approaches: analog, neural, DNA and quantum computation. From an examination of their strengths and weaknesses, we then focus on a sample problem, which has the advantage of being naturally realizable on several different physical systems. Please note that, as this thesis has to cover a wide range of subject-matter, it contains an unusually large component of introductory and review material. Those who are familiar with particular areas should feel free to skim or skip through this material. 1.1 Complexity theory I don’t know much about Art, but I know what I like. Anonymous This section is intended as a brief, general introduction to the aspects of complexity theory that will be required for the remainder of the thesis. As such, it is neither complete nor rigorous; for that, books such as [53] provide a good starting point. The two fundamental classes of problems that we will consider are known as polynomial-time (P) and nondeterministic polynomial-time (NP). Problems in the first class can be reliably solved by algorithms whose maximum execution time (on current computer hardware) is bounded by a polynomial in the size of the problem: they can be solved in “polynomial time”. For a problem to be in the NP class, by contrast, all that is required is that it be possible to verify or reject Chapter 1. Introduction 3 a possible solution in polynomial time; we just need to “know it when we see it”. The term “nondeterministic” refers back to the original definition of these classes in terms of Turing machines. Problems in P are solvable in polynomial time on a standard (deterministic) Turing machine; those in NP are solvable in polynomial time on a nondeterministic Turing machine. A nondeterministic Turing machine is able, at suitable points in a computation, to take random “guesses”; its running time is measured for the case where all the guesses turn out to be right. While problems in P are also in NP (i.e. P is a subset of NP), there are also problems which are thought to be in NP but not P1 . (Figure 1.1 shows a likely guess as to the structure of NP.) The most efficient known algorithms for these problems typically run in “exponential time”, i.e. their execution time is bounded by an exponential function of the problem size. A well-known example of such a problem is the so-called “travelling salesman problem”. In this problem, a salesman has to visit clients in a number of different cities, and we must find the shortest possible route that visits all these cities exactly once, starting and finishing at the salesman’s home base. A naı̈ve method of solving this problem would be to simply enumerate all possible routes, then pick the shortest. However, for N cities, this yields (N − 1)!/2 possibilities, a number which exceeds 10100 by N = 72. A number of better algorithms have been developed, but all run in exponential time. This is a hallmark of this type of problem: the obvious algorithm involves checking an exponentially increasing number of possibilities (because checking an answer is the one thing we know to be easy), and there do not appear to be any significant shortcuts. A crucial property of the NP problem class is that there exist problem types which can be reduced to any other problem type in NP (i.e. we can restate any problem in NP as an example of one of these types); what’s more, this can be done in polynomial time. These problems are known as NP-hard; if they also happen to fall into the NP class, they are called NP-complete. This property means that, if we were able to find a method for solving arbitrary problems of a 1 Although this is widely considered to be true, it is a famous open problem, for whose solution the Clay Mathematics Institute is offering a prize of $1,000,000. Chapter 1. Introduction 4 Figure 1.1: Likely structure of the NP problem class, taken from [58]. given NP-complete type in polynomial time, we could use this to solve any NPclass problem in polynomial time. For us, it means that we can focus on the task of designing a physical system to solve a particular NP-complete problem (the “satisfiability problem”, to be introduced shortly) and draw general conclusions about the ability of such systems to solve NP-complete problems. Although all known exact algorithms for NP-complete problems run in exponential time in the worst case, it turns out that, in fact, the worst case seldom happens. The majority of examples of NP-complete problems can actually be solved quite easily, by heuristic or stochastic algorithms. In addition, for some classes of NP-complete problem (such as the travelling salesman problem) good approximate solutions can almost always be found. This means that the existence of an algorithm which can exactly or approximately solve the majority of instances of an NP-complete problem, while useful in practice, says little about the complexity of the problem as a whole. For example, proteins are synthesized from amino acids and subsequently relax (“fold”) into a lower-energy configuration, in general choosing the configuration with the lowest possible energy, or one close to it. Finding the minimum-energy fold is, in general, an NP-complete problem [60], but approximate solutions are usually easy to find (and the body has mechanisms for destroying proteins which Chapter 1. Introduction 5 fold wrongly). In addition, the crucial biological features of proteins are centred on a small number of binding sites; if these form due to local folding, the protein can still be useful even if it is quite far from the minimum-energy configuration. All this means that the routine solution of this problem by the human body does not tell us anything useful about the computational power of the physical processes involved; the question of finding the minimum-energy configuration may even be irrelevant. Hard instances of NP-complete problems share three features, which lie at the root of their difficulty. First, they have a number of different local optima (i.e. points in their phase space where a slight adjustment of any of the parameters makes matters worse); this traps any solution method which works by moving towards better solutions by considering only the local landscape. Secondly, it is not possible to “factorize” the problem into smaller pieces and solve them separately. For example, in the travelling salesman problem, it is not usually possible to consider parts of the route in isolation: the cities chosen for one part of the route affect the choices available to other parts. By contrast, a routefinding problem (where we simply want to find the shortest route from one point to another) does not have this constraint: adjusting one part of a route has no impact elsewhere, provided the start and end points of the section are kept fixed. The third difficult feature of NP-complete problems is that they appear statistically random: rather like the functions used in random number generators, their structure does not always show up in statistical tests. As a result, any solution method which uses statistical means to identify solutions (such as belief propagation, for example), will be bound to fail. The field of cryptography provides many good examples of this, as most methods for breaking cryptographic codes rely either on statistical features of the original message showing up in the ciphertext, or on regularities introduced by the encoding process. For example, a simple substitution cipher (where e.g. all ‘a’s are replaced by ‘c’s, all ‘b’s by ‘g’s and so on) provides a huge number of possibilities, namely 26! (about 4 × 1026 ), but can be broken by hand in a matter of minutes by taking advantage of the characteristic frequencies of the different letters in a typical message. Switching Chapter 1. Introduction 6 to a homophonic cipher (where each letter is replaced by one of a set of glyphs, the size of the set allocated to each letter being proportional to its frequency) washes out this information, but it is still possible to break the cipher by considering characteristic letter groupings. Once we completely remove all statistical cues, however, the code becomes essentially unbreakable. Mertens [59] was able to show that, according to a battery of statistical tests, the hardest NP-complete problems were indistinguishable from the task of identifying whether a given number was in a randomly-ordered list. The most damaging consequence of this is that there can be no reliable measure of how close a particular point in the solution space of an NP-complete problem is to the optimal solution. Any measure we might use – such as the length of the tour in a travelling salesman problem – will give us no indication of how much of the tour matches an optimal tour. (This is why, as discussed above, finding a good solution is a very different thing from finding the optimal one.) It is interesting to draw a comparison at this point with P-complete problems (which play an analogous role for the P class, and represent the “hardest” problems in the class). These can be factorized, but only in the sense that they can be divided into pieces and the pieces solved one at a time; it is not possible to solve them in parallel, as the initial conditions of each piece depend on the results of solving at least one other piece. Thus both P-complete and NP-complete problems are, in some sense, tied together; the difference is that NP-complete problems are tied together in a more complex manner. In effect, the “pieces” of an NP-complete problem each depend on the solution of all the other pieces, so there is no way to start the solution process without guessing at some of the answer. The degree to which an example of an NP-complete problem is “tangled” accounts for the fact that not all such problems are hard to solve. Examples with few constraints, which are only loosely connected, usually have a number of different solutions. In addition, it is often the case that treating the problem’s parts as being independent is a reasonable (and helpful) approximation. At the other end Chapter 1. Introduction 7 of the scale, problems with many constraints are usually so heavily connected that it is easy to either find a proof that the problem has no solution or use only a few guesses to extrapolate out to the full solution. It is only in the middle ground, where there are enough constraints to reduce the number of solutions without seriously compromising the size of the search space, that hard examples are to be found. 1.2 The satisfiability problem You are chief of protocol for the embassy ball. The crown prince instructs you to either invite Peru or to exclude Qatar. The queen asks you to invite either Qatar or Romania or both. The king, in a spiteful mood, wants to snub Peru or Romania or both. Is there a guest list that will satisfy the whims of the entire royal family? A satisfiability problem, by Brian Hayes [39] We are now ready to introduce our featured problem; this will appear, in various guises, several times in the course of our discussion, so it is worth giving a general introduction to it here. The satisfiability problem is arguably the archetypal example of an NP-complete problem (it was the first, in fact, to be proved NPcomplete [40]), and has been heavily studied, making it ideal for our purposes. The problem can be stated as follows: given a Boolean formula (i.e. a formula involving only Boolean variables), is it satisfiable? In other words, is it possible to allocate values to the variables in the formula such that the whole formula is true? In analysing this question, we can take advantage of a useful property of Boolean formulas: they can always be restated in both conjunctive normal form (CNF) and disjunctive normal form (DNF). In CNF, the formula is written as a series of conjoined clauses (i.e. clauses coupled together by AND operators), such that each subclause consists of a series of disjoint – and possibly negated – variables (i.e. variables coupled together by OR operators). 8 Chapter 1. Introduction To be more precise, if we use ∧ for AND, ∨ for OR, ¬ for NOT and T and F for true and false, a formula in CNF is of the form C1 ∧ C 2 ∧ C 3 ∧ · · · ∧ C n , where the Ci are of the form Ci = X 1 ∨ X 2 ∨ X 3 ∨ · · · ∨ X m . The Xj can each either be xj , ¬xj or F , as desired, and the xj (j = 1, . . . , m) are the Boolean variables in the formula. (Normally, we drop the trivial F terms.) A common formulation of the satisfiability problem is in terms of a CNF formula with exactly k non-trivial terms per subclause; this is known as k-SAT. Clearly, 1-SAT is trivial: simply work through the subclauses, assigning variable values so as to make them true, until either we run into a contradiction or all subclauses are satisfied. The next case, 2-SAT, can also be handled relatively easily, though it is no longer trivial. Consider the following example: A ∨ ¬B ∧ ¬A ∨ B C ∧ ∨ ¬C One way to solve this is to start by assuming that A is true; this satisfies the first subclause. To satisfy the second subclause, we then have to assume that C is true. Finally, to satisfy the third subclause we require that B is true. (Had we started by assuming that A was false, we would have found that B was false – to satisfy the first subclause – and that C was also false – to satisfy the third subclause.) In this way, one starting assumption allows us to make a cascade of deductions, thereby dealing with either the whole problem or at least a non-trivial subset of it. It turns out that this allows all 2-SAT problems to be solved in polynomial time [38]. 9 Chapter 1. Introduction As soon as we move on to 3-SAT, things change markedly. Consider now the example A ∨ ¬B ∨ C ¬A ∨ ∧ B ∧ ∨ C ¬A ∨ ¬C ∨ D Assuming that A is true still satisfies the first subclause, but it leaves us with a 2-SAT problem to solve as far as the remaining clauses are concerned. In this case, this is not a major problem, but it would have been if not all the subclauses had happened to involve A: we would then have had to make a list of all the solutions to the 2-SAT problem, trying them one-by-one as partial solutions to the remainder of the subclauses. If we happen to hit a problem where the 2-SAT subproblems have large numbers of solutions, we potentially have a lot of work to do, thanks to this additional branching factor. As a result of this additional complexity, 3-SAT turns out to be NP-complete. After 3-SAT, nothing very much changes: all k-SAT problems with k > 2 are NPcomplete. The easiest way to see why this should be so is to note that all other SAT problems can be converted into examples of 3-SAT. For example, the 4-SAT clause A ∨ B ∨ C ∨ D can be rewritten as a pair of 3-SAT clauses by introducing a new variable, E: satisfying both A ∨ B ∨ E and C ∨ D ∨ ¬E imposes the same restrictions on A, B, C and D as the original clause. In this process, we have made the problem bigger, both in terms of the number of clauses and the number of variables, but only by a polynomial factor. A curious feature of this problem is that we could also have chosen to write the formula in DNF, which is a disjunction of conjunctions rather than a conjunction of disjunctions. (In other words, the variables are conjoined to make up the subclauses, and the subclauses are disjoined to make up the formula.) For example, 10 Chapter 1. Introduction the 2-SAT problem considered above can also be written as A ∧ B ∨ ∧ C ¬A ∧ ¬B ∧ ¬C In this form, the problem suddenly becomes very simple: the solutions are essentially written out in the statement of the formula. Essentially, the hard part lies in the conversion between CNF and DNF. This is a general feature of NPcomplete problems: there is always a way of stating the problem such that the answer is straightforward, even obvious, but finding it is hard. Another example of this is integer factorization: if we happen to write the number to be factored using a number base equal to one of the factors, it becomes trivial. Another useful formulation of SAT problems is to use binary-valued variables in place of the boolean ones. Thus, for example, satisfying A∨B ∨C is equivalent to solving a+b+c > 0, if we map a = 0 to A = F , etc. This shows that satisfiability problems are essentially just optimization problems in disguise, particularly if we take an unsatisfiable formula and ask for a variable assignment satisfying as many clauses as possible, known as the MAX-SAT problem. Several variations on the basic satisfiability problem have also been considered. Two are particularly interesting for us: XOR-SAT (where the OR operators are replaced by XOR) and 1-in-k-SAT (where exactly one term each subclause must be satisfied). Written in terms of binary-valued variables, these both yield a series of equations, rather than inequalities. For example, satisfying A ∨ B ∨ C as part of a 1-in-3-SAT problem requires us to solve a + b + c = 1. As part of an XOR- SAT problem, this would also be true, but only if all arithmetic is done modulo 2. (The formula A XOR B XOR C can be satisfied either by A = B = C = T , or by making one variable true and the other two false; both alternatives give a + b + c = 1 when arithmetic is done modulo two.) Since 1-in-k-SAT and XOR-SAT can both be written as a system of linear equations, it might appear that both are simpler than k-SAT, but this is not quite true. If there are enough clauses for the system of equations to have a unique Chapter 1. Introduction 11 solution, then it is true, but matters change when the number of subclauses is smaller than this. In this case, several of the variables are left to act as parameters in the definition of the rest. For XOR-SAT, the fact that arithmetic is done modulo two allows these variables to be set arbitrarily – yielding an easy problem – but the situation for 1-in-k-SAT is much more flexible. Here, there is nothing to force a solution of the new problem to map onto a solution of the old: a + b + c = 1 can be satisfied by a = 3, b = −1, c = −1, for example. In the worst case, we might have to try all possible allocations of the parameter variables to find one that gives legal values to all the others2 . As a result, 1-in-k-SAT can still be NP-complete, though it will still be much easier to solve than a k-SAT problem of the same size. In the case of 3-SAT, it might appear that the need to resort to some variant of brute force search is a consequence of our lack of deductive skill: the hardest cases involve a large number of constraints, so it looks like we should be able to apply logical deduction to solve the problem. We can indeed do so, but uncovering the correct sequence of steps amounts to conversion of the problem to DNF, another hard problem. For 1-in-3-SAT, by contrast, the approach outlined above seems to exhaust the constraints and leave us with a purely random problem. Finally, the obvious measure of how close we are to a solution is the number of clauses that are satisfied. However, this can be a useless measure, as the following 2 This is a special case of the problem of solving a system of diophantine equations, for which it is known that no general algorithm exists. While that does not mean that no algorithm can exist for this special case, it does indicate that we have strayed into difficult territory. 12 Chapter 1. Introduction toy example indicates. A A A ∨ ∧ ∨ ¬B ∧ ∨ ¬C ∧ ¬A ∨ B B B ∧ ∨ ¬C This example has two solutions: A = B = C = T and A = B = T , C = F . Imagine, though, that we start from the guess that A = B = F , C = T , which only satisfies two of the five clauses. If we flip the value of one of the variables, we find two options which satisfy three clauses – A = F , B = C = T and A = C = T , B = F – and one which satisfies four clauses – A = B = C = F . The first two options would lead us to within one flip of a solution, but the last satisfies more clauses. Accordingly, “hill climbing” using the number of satisfied clauses as a guide would lead us in the wrong direction. In fact, A = B = C = F is a local minimum; if we only accept flips that reduce the number of unsatisfied clauses, there is no way out. (If we only require that the number of unsatisfied clauses does not go up, however, we can reach a solution by flipping A, then B.) The only reliable guide of how close we are to a solution is the minimum number of flips required to reach a solution, but just finding this value for one variable allocation is as hard as solving the whole problem. On the other hand, this kind of hill-climbing can lead us rapidly to an approximate solution of a MAX-SAT problem; we just cannot expect to be anywhere near the exact solution. 1.2.1 Search space structure Given that the number of satisfied clauses is an unreliable measure of how close we are to a solution, it is important to gain some degree of understanding about Chapter 1. Introduction 13 the structure of the search space of a typical satisfiability problem. The most natural approach to brute-force search for this type of problem is to allocate values to some of the variables (typically the most-constrained first), make as many deductions as possible from them, allocate values to more variables, and repeat until either a contradiction is derived or all variables are found. (This is known as the Davis-Putnam procedure; see [57] for further details.) An obvious measure of how useful this method will be is the number of clauses in the problem relative to the number of variables, and indeed the average difficulty of a SAT instance can be directly related [55] to the ratio between the number of clauses and the number of variables, which we shall call α. (For a more recent review, see [56].) This result comes from an analysis of the problem in terms of statistical mechanics (more usually used to study systems of many indential interacting physical particles, such as magnetic materials or gases), which shows that there is a “phase transition” at this point. In a physical system, this would correspond to a change from e.g. liquid to solid, or paramagnetic to ferromagnetic; here, it corresponds to a change from satisfiable (on the average) to unsatisfiable. Figure 1.2 shows the change in average computation time as α increases, using the Davis-Putnam procedure. Not only is there a sharp peak at the phase transition, but this peak becomes sharper as the problem size increases. This peak, in fact, is exponentially large, reflecting the fact that such methods cannot do very much better than brute-force search of all 2N possible variable allocations. Phase transitions are characterised by a sharp peak in a parameter associated with the system, known as the order parameter; here, this parameter is the average time required to find a solution. Typically, the transition happens because interactions between the particles give rise to correlations; as the transition point is approached, these become stronger, winning out over random noise (such as thermal fluctuations) and leading to ever-larger correlated regions. Eventually, the correlations become the most important feature of the system and the regions coalesce into one. The order parameter is usually a measure of the typial correla- Chapter 1. Introduction 14 Figure 1.2: For random 3-SAT problems, the time taken to find a solution (here measured by the number of algorithm calls) peaks at α ≈ 4.25, and the peak becomes increasingly marked as the number of variables rises. (Adapted from [47] by Bart Selman.) tion length; at the transition point, this increases to encompass the whole system. On the other side of the transition point, the correlations become so strong that the system divides up into separate regions. Each region is so strongly correlated that it is impossible for neighbouring regions to force them to match their correlation. As the order parameter increases, these regions become smaller and smaller. The crucial ratio αc , as we can see from Figure 1.2, is about 4.25 for 3-SAT, corresponding to each variable appearing about 13 times. Above this point, it becomes likely that small clusters of clauses will be mutually inconsistent, allowing the problem’s unsatisfiability to be easily shown; below it, the problem Chapter 1. Introduction 15 Figure 1.3: Probability that a random 3-SAT problem will be satisfiable, as a function of α. Note the sharp drop centered around α ≈ 4.25. (Adapted from [47] by Bart Selman.) is only loosely connected, and can be easily factorized. This is confirmed by Figure 1.3, which shows a sharp transition at αc . From this point of view, the reason why 2-SAT is easier is that even the hardest examples never have enough “elbow room” to allow the search space to increase exponentially. The structure of the set of solutions to a typical problem goes through several distinct phases as α increases. Figure 1.4 shows that, below α ≈ 3.8, the most numerous local minima are actually complete solutions, but that this becomes increasingly less likely as α increases. Between α ≈ 3.8 and αc , the problem is satisfiable, but finding a satisfiable minimum is harder. In [52], the phases are given as: • α < αd ≈ 3.86, wherehe set of all solutions is connected, i.e. these is a path of mutually adjacent solutions that joins any two solutions of the set; • αd < α < αb ≈ 3.92, where the set of all solutions breaks up into a number of disconnected clusters; • αb < α < αc where, for each cluster, there are literals which take the same Chapter 1. Introduction 16 Figure 1.4: Phase diagram for 3-SAT. The variables plotted are: Σ, defined as log(no. of satisfiable states)/N ; e0 , the minimum number of violated clauses per variable; and eth , the typical energy (per variable) of the most numerous local minima. (An energy of zero implies that all clauses are satisfied.) Taken from [54]. value for all solutions in the cluster; these are known as the backbone of the cluster. As we approach αc , not only do the solution clusters get smaller and fewer in number (making them harder to find), but this results in an increase in the number of pseudo-solution clusters (i.e. clusters associated to a local, rather than a global, minimum). Essentially, the allocation space divides up into an increasing number of clusters, but fewer and fewer of them correspond to global minima. In the worst case (at αc ) the total number of clusters is exponential in the size of the problem, but satisfiable problems typically only have one or two solutions; we wind up looking for a needle in a haystack. Chapter 1. Introduction 1.3 17 Solution methods for the satisfiability problem Apart from deterministic search-and-backtrack algorithms such as Davis-Putnam (which are impractical for problems of any useful size), the practical search methods that have been found for the satisfiability problem can be divided into three broad classes: local search, genetic/evolutionary methods and message passing methods. 1.3.1 Local search methods Despite our comments above, the most popular local search solution methods for the satisfiability problem invariably use the number of satisfied clauses as a guide to the solution process. The problem is that no other guides are available, and methods which avoid its use are reduced to essentially bouncing around in the search space until they hit a solution; not a very efficient approach for large problems. As mentioned above, solution methods can use the structure of the problem to avoid having to search the entire space of allocations, but the best methods we know of (see [41], [42]) still have a worst-case bound of about 1.5N for a problem of size N . The method of [42] is particularly interesting: it involves picking a set of points in the allocation space and exhaustively searching the area within a given number of flips for a solution. The set of start points are chosen to form a covering code, i.e. such that all points in the allocation space should fall within the search area of one of the start points. In this way, the algorithm is guaranteed to find the solution, if one exists. By relaxing the constraint that the code must be a perfect covering code, Schöning [43] also found a very reliable – though not perfect – algorithm with a worst-case bound of (4/3)N . Unfortunately, however, even this bound means that a problem with 400 variables has a search space of about 1050 , which is clearly impractical. By contrast, probabilistic local search methods can work very well, provided that Chapter 1. Introduction 18 they take advantage of the number of satisfied clauses as a measure of success. Without doing this, the only ways to pick the next variable to flip are either to select it randomly, or to pick an unsatisfied clause at random and then select one of the variables involved in it. The first choice is clearly hopeless; the second turns out to work surprisingly well for 2-SAT [44] – finding a satisfying assignment in O(N 2 ) – but does not work well for 3-SAT above αd . The next-simplest method, known as GSAT, always flips so as to reduce the number of unsatisfied clauses as far as possible. Clearly, this will almost invariably get stuck in a local minimum, but, by restarting the search every time this happens, the algorithm still outperforms deterministic search-and-backtrack methods [46]. Allowing the algorithm to make “sideways” (i.e. non-improving) moves if nothing better is available improves its performance substantially [46]. An alternative means of preventing this type of method getting stuck is provided by our first physically-motivated method: simulated annealing. In this method, we select a variable at random, and calculate the net change in the number of satisfied clauses, δ. We then accept the flip with a probability proportional to e−δ/T , where T is a notional “temperature”. This is motivated by the behaviour of magnetic materials, of which we will see more later. Some variants of the method [48] modify this by accepting the flip with certainty if δ < 0, but this is not physically motivated, and makes relatively little difference. Interestingly, simulated annealing is outperformed by a method called WalkSAT [45]. WalkSAT improves on GSAT by applying GSAT with probability p, and picking a random variable in a random unsatisfied clause with probability 1 − p. (The best choice for p turned out to be about 0.5–0.6.) The difference is quite significant: in [45], simulated annealing was “only” able to solve problems with up to about 600 variables, whereas WalkSAT was able to manage 2000 variables. Simulated annealing was, however, better than the basic GSAT method, or one which applied GSAT with probability p and picked a random variable with probability 1 − p. This shows that deliberately selecting the direction of steepest descent is no better – in fact, slightly worse – than simply trying for some degree Chapter 1. Introduction 19 of descent. The success of WalkSAT can be attributed to a judicious combination of these two types of descent. Recent surveys (see e.g. [49]) show that WalkSAT, while not the best possible algorithm, still performs well. 1.3.2 Genetic/evolutionary methods The idea behind genetic algorithms is to mimic the natural selection process, whereby “fit” individuals survive and propagate, while unfit individuals die off. This “selection pressure” makes the species as a whole evolve to better fit its environment. In genetic algorithms, the “individuals” are candidate solutions to a problem, and fitness is determined according to how well they solve the problem. In terms of the satisfiability problem, the individuals can be straightforwardly represented by a binary string, whose ith digit represents the value of the ith variable. Offspring can be formed by crossover (forming a new individual from the initial part of one “parent” and the latter part of another, with the crossover point determined randomly), and individuals can be mutated by randomly flipping bits with some probability. The problem, of course, is that the only way of measuring fitness is by reference to the number of satisfied clauses. Genetic algorithms often work best when combined with local search, and indeed Folino et al. [50] found a combination with WalkSAT to be beneficial. Rather than mutating randomly, the individuals were mutated to the local maximum allocation as determined by WalkSAT. Applied to a population of 320, this procedure halved the number of iterations required as compared to 320 parallel WalkSAT instances. 1.3.3 Message passing methods Message passing methods view satisfiability problems as graphical models. These models have two sets of nodes: one set represents the variables, and the other represents the constraints. Edges connect the variable nodes to the constraint Chapter 1. Introduction 20 nodes in which they are featured. Finally, we associate conditional probabilities with each of the nodes: probabilities that the clauses are true, given the possible allocations of their variables, and probabilities that the variables are true, given the possible states – satisfied or unsatisfied – of the clauses in which they feature. Clearly, accurate knowledge of this last set of probabilities can be used to extract satisfying allocations by asking for the variable probabilities associated with all the clauses being true. It is known that, for graphs that are tree-like (i.e. have no cycles), a procedure called belief propagation can be used to repeatedly update the values of the probabilities for a node from the values associated with all its neighbouring nodes in such a way that the probabilities converge on their correct values (see e.g. [77] for further details). Convergence occurs regardless of the initial estimates used. Unfortunately, the graphs that naturally occur in satisfiability problems turn out not to be tree-like, so belief propagation cannot strictly be applied. However, provided that the graph is approximately tree-like (i.e. has only long cycles), it can still work effectively. In the form we have described, belief propagation works well in both the heavilyconstrained and lightly-constrained regions, but fails badly as the problem instances grow harder [51] and as the size of the problem grows. This shows that the assumption that the loops are long is invalid in this region, as we would expect; the more “tangled” the problem is, the more chance there is that short loops will be formed. These results do not spell the death knell for message passing, however; they simply mean that a more cunning means needs to be found to convert the problem into a graphical model. An approach that is under active investigation is variously called the cavity method and the survey method; see [54] and [52]) for further details. This is intimately related to a formulation of the satisfiability problem as an Ising model, but regrettably we are unable to go into it further here. Chapter 1. Introduction 1.4 21 Summary So far, we have seen just how tricky NP-complete problems can be: they are characterised by a large, complicated-looking solution space with no reliable compass. We have also seen, however, that there is only a narrow range in which these problems are genuinely hard; outwith this range, even pretty simple and naı̈ve methods work well. The hardest NP-complete problems, by contrast, have exponentially many local minima in which to trap the unwary and are too tightly connected for simple belief propagation to be effective. This, combined with their apparently random statistical properties, explains why, for all their apparent structure, it has been so hard to beat simple brute-force search for these problems. For us, this is pretty strong intuitive evidence for assuming that NP-complete problems are indeed more complicated than problems in P. Potentially, a brilliant insight could show that a polynomial-time algorithm is possible, but we will follow our common sense and assume that it is not. That does not mean, however, that we cannot hope to find more effective tools from which to build algorithms; before this thesis is finished, we will have shown that, at least for some problems, such tools do exist. Chapter 2 On the complexity of physical systems Physical systems can give rise to complex and unpredictable behaviour, as weather forecasters are painfully aware; the question is whether all this complexity can be put to use, rather than just stopping play. In particular, is it possible to design a physical system so that, started off from the right initial state, it solves a hard problem simply by virtue of its evolution? Even conventional computers are based on physical systems which naturally do something useful, albeit just computing the logical AND of two binary inputs. Before we discuss this question in earnest, however, it is perhaps useful to consider an example of where this line of thought can lead. 2.1 The neon route-finder The inspiration for this project came from a beutifully simple device, devised by Manz et al. [21], which uses the fact that electricity follows the path of least resistance to solve route-finding problems. A streetmap of London was etched into a small sliver of glass, and then a second piece was fixed over the top to create a network of tiny tubes. These tubes were then filled with neon gas and sealed. 22 Chapter 2. On the complexity of physical systems 23 When a voltage is applied between any two points on the map, the shortest route between those points lights up like a tiny neon strip light; the route it found between Imperial College and Admiralty Gate is shown in Figure 2.1. Figure 2.1: The shortest route between Imperial College and Admiralty Gate, as found by the neon route-finder. (Reproduced from the Imperial College Reporter.) Our feeling, in commencing this project, was that this almost magical ability to solve minimization problems could be put to use in solving harder problems (or at least provide very rapid solutions for easy problems). However, we came to realise that, while Manz’s device will always find a plausible route, it will not necessarily find the optimal route. To see why, consider the circuit shown in Figure 2.2. As with Manz’s neon tubes, this circuit uses resistance as a measure of distance; as such, its behaviour should be analogous to that of the neon tubes. The “shortest route” through this circuit is to take the leftmost branch at each junction; the problem is that this is not the path that turns out to carry the greatest current. The net resistance of a set of resistors connected in parallel is given by Kirchhoff’s law, namely 1 1 1 1 = + + Rnet R1 R2 R3 for resistances R1 , R2 and R3 . Using this, the net resistance of the leftmost of the three main branches of the circuit is 2 + 1/(1/3 + 1/30 + 1/30) = 4 1/2 Ω, 24 Chapter 2. On the complexity of physical systems 2Ω 10 V 30 Ω 3Ω 30 Ω 3Ω 7Ω 5Ω 1Ω 4Ω 4Ω 4Ω 4Ω Figure 2.2: Example “route-finding” circuit as compared to the rightmost branch’s total of 3 + 1/(1/4 + 1/4 + 1/4) = 4 1/3 Ω. This means that, if we test the current flowing through the branches at the upper branching point, we will find more flowing through the rightmost branch than the leftmost. On the other hand, we will still find the greatest current flowing out of the shorter path at the bottom of the tree. Even this, however, will not always be the case. If we imagine “disentangling” a multiply-branching circuit into a circuit with only one branch, but many leaves (one for each path through the original circuit), we can ask what the equivalent resistance of each path is. For the branch shown in Figure 2.3, a little calculation shows that R = R1 + R11 + R11 R1 µ 1 1 . + R12 R13 ¶ Without the final term on the RHS, this equation would have given us what we wanted: the greatest current leaving a tree-like circuit would flow through the exit which corresponded to the path with the lowest total resistance. However, the final term allows this to be skewed by the rest of the circuit, to an extent that will get progressively worse as the number of branching points increases. For example, always taking the rightmost branch gives a path with an effective resistance of 13 Ω, for a “length” of 7 Ω. Taking the centre branch and then the leftmost branch, by contrast, gives an effective resistance of 11.15 Ω for a length of 8 Ω. Thus the second path, though longer, will carry more current. 25 Chapter 2. On the complexity of physical systems R1 ≡ R11 R12 R13 R Figure 2.3: A circuit with multiple branches can be considered as a circuit with only one branch, but many leaves. To do this, we need to calculate the equivalent resistance of each path through the multiply-branching circuit In the interests of fairness, it must be said that the odds of circuits such as these finding a reasonable path are still pretty high: we had to work quite hard in Figure 2.2 to create a situation in which it would fail, and it nevertheless found the second-shortest route. The dominant terms in calculating the resistance do come from the shortest paths, so we would only expect things to go badly wrong under quite unusual circumstances, and devices such as Professor Manz’s are still likely to be practically useful. (In fact, they will have the useful property of erring in favour of paths with several reasonable routes nearby – as in our example – giving drivers some alternatives in case of mishaps.) That said, however, they do not provide a “proper” solution, in that they cannot cope with all eventualities. An interesting side-note is that the effectiveness of this type of mechanism would have been greatly aided if Kirchhoff’s law had given greater weight to small resistances; for example, a law such as 1 1 1 1 = n+ n+ n n Rnet R1 R2 R3 with n > 1 would have made the circuit select the lowest-resistance path more aggressively. Since Kirchhoff’s law is just a consequence of Ohm’s law (V = IR for current I and voltage V ) and current conservation, such a circuit could be fashioned out of materials offering non-Ohmic conductivity properties. (For the above, we would be looking for substances which obeyed V = IRn .) Unfortunately, we are not currently aware of any such substances. Chapter 2. On the complexity of physical systems 2.1.1 26 Discussion The neon route finder shows that clichés like “electricity always follows the path of least resistance” need to be treated with caution. On the other hand, it also shows that the behaviour of apparently complex systems can often be described quite simply. At the lowest level, the route-finder and its electrial analogue both involve the flow of a large number of electrons, but we did not need to worry about their individual behaviour because their relevant bulk properties were adequately described by the concept of a “flow” of current meeting a certain amount of resistance. In fact, the route finder could equally well have been built using pipes with water flowing through them, the whole device being tilted to add a gravitational potential, rather than an electrical one. The trouble with this type of description is that it relies on there always being enough particles at any given point to approximate a continuous flow. For more complicated problems, this can potentially lead us into trouble: an exponential number of possibilities could require an exponential number of particles. Imagine, for a moment, that we have built a version of the route finder using pipes and ball bearings. At each junction, we have arranged for there to be an equal probability of the ball bouncing into each of the exit pipes, making the whole device a bit like a pinball machine. If we sent balls through the device one at a time, the time they took to come out the other side would be a measure of how far they had travelled. By sending enough balls through, we would eventually have a pretty good estimate of the shortest path, but this would require O(b n ) balls if the path involved n junctions with an average of b exits each. In other words, we would have an exponential algorithm. If we decided, instead, to send a continuous series of balls through the device, then something quite interesting would happen: the longer paths would start to get clogged with balls, making fresh balls more likely to take the unclogged paths. In this way, as with the electrical and water-based versions, we would start to Chapter 2. On the complexity of physical systems 27 find the greatest flow along the shorter paths. Putting many balls through at once would give the device an elementary “memory” of what the best paths were (no balls mean good paths, if you like). However, this can only start to happen when the rate that the balls are sent through the device is great enough for it to start clogging up; in short, there is power in numbers. This fact has recently been realised by designers of so-called “ant colony” algorithms, where a large number of virtual ants explore a representation of the problem, laying down a virtual pheromone as they go. By probabilistically choosing the paths with the most pheromone, they come to follow the shortest path. This is, in a sense, the inverse of what happens in the route finder: the level of pheromone is a measure of how good a path is, whereas the level of clogging reflects how bad it is. The end result, though, is fairly similar. 2.2 General properties of physical systems Siegelmann [34] identified three main properties that distinguish analog from digital models: 1. Analog computational models are defined on a continuous phase space (e.g. where the variables x may assume analog values), while the phase space of a digital model is inherently discrete. 2. Physical dynamics are characterized by the existence of real constants that influence the macroscopic behaviour of the system. In contrast, in digital computation all constants are in principle accessible to the programmer. 3. The motion generated by a physical system is “locally continuous” in the dynamics. That is, unlike the flow in digital computation, statements of the following forms are not allowed in the analog setup: “tests for 0” or “if x > 0 then compute one thing and if x < 0 then continue in another computation path.” Chapter 2. On the complexity of physical systems 28 We are particularly interested in the first and last of Siegelmann’s properties here; we find it hard to see how her real constants (such as the gravitational constant or the charge on an electron) can do more than act as uniform scaling factors. The effects of continuity and continuous dynamics are much more important, however, as we shall see in the next two sections. 2.3 Analogue vs. digital From the analysis of the neon route-finder, it is easy to imagine that continuity is a good thing, and that systems which can at least approximately be thought of as continuous flows are what we want to use as a basis for a computer. The physical world seems analogue, not digital, so the choice to build digital computers seems increasingly perverse; this, however, is not the whole story. Continuous signals, manipulated using continuously varying devices, might at first appear to have almost unlimited computational power. In the same way that any number, however large, can be represented as a real number between zero and one (by encoding it as a decimal expansion), an analogue signal can, in principle, carry limitless amounts of information. In addition, by adding several different signals together to form one composite signal (much in the way that several telephone conversations can be carried over a single wire by using different carrier frequencies), calculations can potentially be carried out in parallel with no increase in effort. (This last feature is now being rediscovered in the context of quantum computing, of which more later.) The problem with this scenario is that it only works effectively if there is no noise in the signals, and the device components perform their functions perfectly; neither requirement is likely to be true in practice. It was for this reason that Claude Shannon, from an investigation of the first general-purpose analogue computer, proposed the shift to digital computation. If a signal can only take a discrete range of values, then it becomes possible to correct a certain amount of noise: simply adjust it to the nearest appropriate value. Similarly, imperfections in Chapter 2. On the complexity of physical systems 29 the operation of the machine components can be corrected by performing a discrete series of steps, rather than a continuous shift, and correcting the signal as the computation proceeds. In this fashion, it becomes possible to perform an arbitrarily long computation on an arbitrary number of bits without error. The important question, for us, is whether this reduction from analogue to digital is really necessary, or whether anything more can be salvaged; is there anything that an analogue computer could reliably do that a digital computer cannot? Siegelmann’s properties imply that the natural description of an analog computer is as a dynamical system, and that its capabilites depend on the structure inherent in that system. In assessing the computational power that is available, the best guide is the Church-Turing principle [35] [36], which conjectures that no physical system can perform a computation more than polynomially faster than a universal Turing machine (or a digital computer). Simulations of the weather or other naturally chaotic systems break this rule, because errors propagate exponentially with time, requiring exponentially increasing accuracy as the simulation time increases: the butterfly effect. On a digital computer, this means doing the simulation on a finer and finer mesh, with smaller timesteps, and handling the variables of the problem with greater precision; this all adds up to exponential consumption of the computer’s resources. Clearly, then, if we could use a chaotic system as a computer it would have greater power than a conventional computer. The problem, of course, is that the same features that make a chaotic system hard to simulate also make it hard to control: we need to set its initial conditions with exponential precision, and keep it noise-free to an exponential degree. In fact, as we shall see in Chapter 4, Heisenberg’s uncertainty principle implies that even if we were able to have zero noise and arbitrary precision, there would still be a small amount of uncertainty involved in the parameters of the system. Over time, this would propagate exponentially, meaning that there would still come a time when the system’s state was completely random. In effect, this means that any property of a physical system which can be reliably predicted can also be simulated in polynomial time, though the power of the polynomial would be Chapter 2. On the complexity of physical systems 30 unfeasibly large. As an aside, we can view this in terms of the “washing out” of information from the system; it implies that there is a finite limit to the amount of information that any physical system can store and manipulate reliably1 . This is reminiscent of Mandelbrot’s famous argument that the coastline of Britain is fractal: at most scales, this appears to be true, but it does not remain true indefinitely. The flip side of this argument is that any physical system which is easy to control (i.e. is robust to noise) is also easy to simulate. Simulation errors are equivalent to noise in the system, so simulations of such systems are self-correcting. Lloyd [20] has suggested that this fact could be put to practical use in quantum computers: by using a quantum computer to simulate a physical system which would robustly perform a given computation, much less attention need be given to making the quantum computer itself robust. These properties effectively make conventional computation by a chaotic system impractical, but they do not rule out stochastic computation of the type embodied by the neon route-finder. In fact, the chaotic nature of the system now becomes an advantage, as it helps the system to explore the search space of the problem. Provided that solutions to the problem correspond to stable attractors of the dynamical system, there is at least a chance that the system will spontaneously settle into the global minimum. There are thus two approaches to noise in a computational system: one is Shannon’s, where it is rigorously corrected before it builds up to a point where it can do any damage; the other is to use it to help kick the system out of local minima in a stochastic search2 . This is clearly related to the simulated annealing approach: by slowly reducing the noise level (by e.g. cooling the system down) we can, with luck, “freeze” it at the global equilibrium. This relies on transitions 1 If we make the transition from classical to quantum systems, this changes somewhat; we will come back to this point in Chapter 4. 2 Strictly speaking, there is also a third: encode the problem in such a way that the noise has no effect. Topology is a good candidate for this (a noisy torus is still definitely a torus, and not a sphere), but the only practical applications which have so far been found are in the field of quantum computing [84]. Chapter 2. On the complexity of physical systems 31 to lower-energy basins being more likely than ones to higher-energy ones; given time, the system’s evolution should take it to states with lower and lower energy. As with simulated annealing, however, this can only be relied on to find a good approximate solution, especially for NP-complete problems. In some systems, like the route finder, this idea works well, because they are interently stable: if they are perturbed, they will tend to relax back to where they were. The difficulty is that this is a reflection of the problems that they are designed to solve: the global minimum lies in a basin, and they will return to it unless they are kicked all the way out of the basin by noise. The more complicated the search space of the problem is, and the smaller and shallower the basins are, the greater the odds are that the system will either never settle, or reach only a local minimum. In other words, this approach is a real physical implementation of a simulated annealing-type algorithm, and will have the same benefits and difficulties. A second requirement for a dynamical system to be computationally useful is that its evolution should be computationally irreducible, i.e. that it should be impossible to say anything of value about future states of the system without explicitly simulating each step of its evolution [9]. In other words, there should be no “shortcuts” to the answer; otherwise, we can potentially build a simple simulation on an ordinary computer and beat it to the answer. This rules out systems like the route finder: as discussed earlier, a very crude description sufficed to describe the relevant details of its evolution. These arguments add up to a somewhat depressing picture, in that physical systems appear, at best, to be capable of implementing only pretty crude types of algorithm. However, their strong suit is that they can potentially do what they do very quickly; the success of ant colony algorithms shows that careful application of simple-minded methods can be very effective. As was discussed in the previous chapter, NP-complete problems apparently have an exponential nature, but that can manifest itself either in terms of time or space requirements. Here we see another manifestation: exponential accuracy Chapter 2. On the complexity of physical systems 32 requirements. By encoding and manipulating information in a continuous way, a perfect analogue computer could potentially perform an NP-complete calculation in polynomial time, but obliging it to work within real-world constraints would almost certainly throw an exponential spanner in the works. This insight will be a guiding light throughout the rest of this work: it is easy to be seduced by systems which could work in theory, but do they still work when real-world constraints are imposed? 2.3.1 A diversion into chaos theory An interesting perspective on the issue of “washing out” information comes from the problem, first posed by Penrose [37], of determining the boundary of the Mandelbrot set. This set is constructed from the iteration zi+1 = zi2 + c, where c and the zn are complex numbers; c is a member of the set if (and only if) the iteration converges. The boundary of this set is fractal (and arguably beautiful), but it has been shown that determining whether or not a given point lies in the set is undecidable. By considering a computer which operated on realvalued quantities, but using discrete timesteps, Blum, Shub and Smale (BSS) [85] defined analogues of the usual complexity classes, and showed that Penrose’s problem was equivalent to the halting problem. They found that there was no limit to the number of iterations that could be required to see whether or not the iteration had converged. If we imagine trying to solve this problem with a physical system, we find a different problem: the inherent uncertainty implied by a real system means that we are doomed to failure. Once we come close enough to the set’s boundary, we have no way of ensuring that we have not accidentally calculated the answer for a point on the other side of the boundary. We can get round this problem to some extent by scaling the quantities we use (so that the uncertainty occurs on a scale Chapter 2. On the complexity of physical systems 33 too small to be relevant), but the scaling must become exponentially large as we come exponentially close to the boudary. 2.4 The problem of local continuity The Total Perspective Vortex derives its picture of the whole Universe on the principle of extrapolated matter analyses. To explain – since every piece of matter in the Universe is in some way affected by every other piece of matter in the Universe, it is in theory possible to extrapolate the whole of creation – every sun, every planet, their orbits, their composition and their economic and social history – from, say, one small piece of fairy cake. Douglas Adams, The Restaurant at the End of the Universe Siegelmann’s property of “local continuity” implies that logical branching operations (like if . . . then . . . ) cannot be rigorously implemented. However, as we shall see in Chapter 3, it is possible to provide an acceptable approximation. For us, a more serious difficulty implied by local continuity is that changes in the system can only be felt locally, and take time to propagate to distant regions. Locality is an important and essential feature of physical systems3 – within a relativistic framework, it ensures that cause precedes effect – but it has to be artificially imposed on simulations of physical systems. Simulations are not bound by the laws of physics unless we choose to make them so. From the introduction, it should be clear that the structure of NP-complete problems is inherently complex, making it likely that any mapping onto a physical system would benefit from non-local connections. Digital computers allow for non-locality at the cost of discrete time evolution: the computation is halted to give the signals time to get where they need to go. Allowing evolution in continuous time potentially makes the process unstable, as the system will inevitably end up acting on information that is “out of date” 3 It comes as no surprise that almost all physical laws are couched in terms of differential equations. Chapter 2. On the complexity of physical systems 34 to some extent. As with systems of iterated equations, the choice whether to use the most recent information or information from the previous iteration has a profound effect, especially if the system is unstable. On a more theoretical level, it is known that problems defined on planar graphs tend to be easier to handle; often, requiring that its underlying graph be planar reduces a problem from NP to P. Clearly, it is much easier to map a planar graph onto a physical system, as many more of the edges can be mapped onto local connections; from this, it is tempting to conclude that this makes the system’s evolution less complicated. To see what would happen if we were to remove the restriction of locality, we need only look at classical physics. In the Newtonian universe, bodies feel their mutual gravitational attraction instantly; there is no conception that the influence of one body on another needs time to propagate across the gap. As a result, time is essentially just a parameter: any calculation can be done in half the time simply by doubling the speed of the particles involved. Ian Stewart [17] was able to use this to design what he called the “rapidly accelerating computer” (RAC). By means of exponentially accelerating particles, the internal clock of the RAC doubles in speed with each cycle: the first cycle takes 1/2 s, the next 1/4 s, and so on. As a result, literally any calculation can be carried out in a second or less. Even the halting problem (deciding whether or not a given algorithm will eventually terminate) would be decidable using such a computer. In saying this, of course, we have to leave out practical considerations, such as the limitless amounts of energy it would take to power such a computer, but the principle remains. With only local interactions, not only would the different parts of the computer eventually fall out of sync (when the clock rate became so fast that the signals could not cross the gap in time) but the clock-speed doubling could only be carried out a finite number of times. In other words, we would have to truncate the series that gave the exponential behaviour at a finite point, thereby yielding polynomial behaviour. In some cases, locality constrains the behaviour of a physical system so far that Chapter 2. On the complexity of physical systems 35 it is indeed possible to extrapolate from the proverbial piece of fairy cake to the state of the whole system. In others, however, this does not follow: if we imagine the space of states encompassing all physically possible configurations of the system, specifying the properties of only a small region of the system merely cuts down the size of the space; it does not reduce it to a single point. As we saw with the satisfiability problem, more freedom means more complexity; this is equally true of physical systems, as the next two subsections show. 2.4.1 The Zealot Model As an interesting illustration of the degree of freedom inherent in different physical systems, we now consider Mobilia’s “zealot model” [10]. This is based on the Ising model, which we will discuss in more detail in the next subsection. The essential idea is to consider a lattice of “voters”, most of whom have no firm convictions and are therefore inclined to listen to their neighbours. One of these voters, however, has an opinion that is fixed to the point of fanaticism, and will hold that belief no matter what: the “zealot”. The central question is how far the zealot’s opinions will propagate through the rest of the population. To make this more concrete, the voters can favour one of two points of view, coded as ±1. At each time step, an individual is chosen at random and assigned the opinion of one of its (again randomly chosen) nearest neighbours in the lattice, and the evolution of the system is followed until it reaches a steady state. The result is that, for one- and two-dimensional lattices (i.e. lines or flat grids), the only steady state is unanimous agreement with the zealot. However, in three dimensions the result is that the average vote is inversely proportional to the voter’s distance from the zealot. (In more than three dimensions, the average vote is proportional to r 2−d , where r is distance from the zealot and d is the number of dimensions.) From this, we can see that the one- and two-dimensional models are examples of systems where the state of the system as a whole are (at least in the long term) Chapter 2. On the complexity of physical systems 36 determined by its local state. For three or more dimensions, we still get some global information, but the system is clearly far less constrained. In addition, the only results that can be extracted are statistical in nature, telling us nothing about the opinions of any one voter (except for those voters which happen to be very near the zealot). Another interesting result for this model is that the time taken to reach unanimity √ at a given point (to some given degree of approximation) is proportional to 1/ t in one dimension, but 1/ ln(t) in two dimensions. This means that requiring an increased degree of accuracy entails a polynomial wait in one dimension, but an exponential one in two dimensions. From this analysis, we gain some useful intuitions. First, we see that giving a system more degrees of freedom makes it more likely that it will have a number of different minima (either all equal, as here, or some more equal than others). In essence, it becomes possible to satisfy the majority of the constraints at any given point without satisfying all of them, and in such a way that we can’t (locally) do better. Second, we see that there are at least three régimes: easy, hard (but still with a unique solution), and impossible (due to multiple solutions). In the second régime, we can guarantee to get a solution eventually (though possibly with an exponential wait), but in the third we can get irretrievably stuck, with no option but to start again. In computational terms, the first régime corresponds to easy problems, the second to apparently hard (but computationally reducible) problems, and the third to hard, computationally irreducible problems. This last class corresponds naturally to NP-hard problems: given a possible steady state of the zealot model, we can easily check it, but finding one with a particular set of additional properties would appear to involve evolving the model for an exponential amount of time. Chapter 2. On the complexity of physical systems 2.4.2 37 The Ising model Following on from our discussion of the zealot model, we now – as promised – consider the Ising model. This is superficially very similar to the zealot model, in that we again have a d-dimensional grid, populated by entities which can take two values, though this time, in deference to the model’s origins in solid-state physics, we call them “spins” rather than “voters”. The spins are “coupled” to their neighbours such that the total energy of the system, H, can be written as H=− X Jij σi σj , ij where the σi are the spins and the matrix Jij represents the couplings. The evolution of the Ising model is essentially a physical version of simulated annealing: if a spin flipping would reduce the overall energy of the system by an amount δ, it does so with a probability proportional to exp(−δ/kT , where T is the temperature of the system and k is Boltzmann’s constant. This is a good, though simple, model of the thermal fluctuations in a real magnetic material. In the simplest case, the Jij are taken to be zero for spins that are not adjacent on the lattice, and J otherwise; in other words, the only interactions are between nearest neighbours and all are coupled equally. This is known as a ferromagnetic Ising model. Finding the ground state of such a model is clearly trivial (simply assign all spins to the same value), but finding the partition function is less easy. The partition function describes the probability of finding the system in a state of a given energy, and thus has to be aware of all possible states of the system. In one dimension, this is still fairly easy, and in the two-dimensional case an analytical solution is still possible (though far more difficult). In three or more dimensions, however, the problem becomes NP-complete, and no exact algorithm is known which is significantly faster than brute-force enumeration. The next-simplest case, and the one which will hold our attention for a while, still allows interaction only between nearest neighbours, but allows them to be ±J or 0 otherwise. The non-zero interactions are still all of the same strength, but they may be positive or negative, and we are free to specify which on an element-by- 38 Chapter 2. On the complexity of physical systems element basis; this is known as a (random) Ising spin glass model. Even finding the ground state of such a model is no longer straightforward, except in one dimension, because it can now be frustrated (i.e. it is no longer possible to satisfy all the bonds). The two-dimensional case can still be solved in polynomial time, but for three or more dimensions the problem is NP-complete. The spin glass model is very heavily studied, both as an elementary model of a magnetic substance and because it has a natural relation to problems such as the satisfiability problem (as we shall see in Chapter 5) making it a natural “complexity laboratory”. In fact, this second feature means the spin glass model is largely responsible for the current application of statistical physics methods to complexity theory [55]. It is interesting to note that the NP-completeness result follows from the fact that finding the ground state of such a model can be reduced to solving a problem known as minimum cut. The task here is to take a graph with weighted edges and divide it into two distinct (unconnected) pieces by removing edges, such that the total weight of the removed edges is as small as possible. In terms of an Ising model, the vertices of the graph correspond to the spins, and the edges to the bonds; having made the cut, we assign all the spins in one piece to be “spin up”, and all the spins in the other piece to be “spin down”. To see why this represents the ground state of the problem, imagine an arbitrary cut has been made and the spins assigned. Denote by E the set of all bonds, and by E + , E − and E +− the sets of edges which connect pairs of up spins, down spins and opposite spins respectively. We can then rewrite the total energy as HC = − = − X ij∈E + X ij∈E Jij − Jij + 2 X Jij + ij∈E − X X Jij (2.1) ij∈E +− Jij . (2.2) ij∈E +− The first term is a constant of the problem, so minimizing the energy is equivalent to minimizing the second term, which is just twice the weight of the cut. Polynomial-time algorithms are known for solving the minimum cut problem on Chapter 2. On the complexity of physical systems 39 Figure 2.4: Ising “wire”. The solid lines represent +J-weight bonds and the dotted lines represent 0-weight bonds. Figure 2.5: Ising “AND gate”. The solid lines represent +J-weight bonds and the dotted lines represent 0-weight bonds. The inputs are on the left, and the output is on the right. The middle input must be an impurity, used to bias the result. any planar graph, but it is NP-complete for non-planar graphs unless all the weights are non-negative. The difference between the two- and three-dimensional cases allows us to draw some quite revealing conclusions about the complexity of the computational problems that they can potentially solve. (The discussion that follows is broadly in the spirit of Kaye’s proof that the game Minesweeper is NP-complete [15].) The idea is that it is possible to recast elementary computer elements, such as wires and logic gates, in terms of configurations of a two-dimensional Ising spin glass, such that the result of a computation can be read off in terms of its ground-state energy. Consider Figure 2.4. This shows an implementation of a simple “wire”, which is essentially just a one-dimensional model embedded in a two-dimensional one, “insulated” by a zero-coupling layer. Minimizing the energy of the system requires all the spins along the wire to line up, correlating the spins at either end. 40 Chapter 2. On the complexity of physical systems Implementing a NOT gate is very similar: simply introduce a −J coupling into the wire. If we could also implement an AND gate, then we would have all we need to build arbitrary logic circuits. This, however, is not possible: the invariance of the model under inversion of spins implies that the outputs of such a gate for inputs of 1, 0 and 0, 1 would have to be different. One way to get round this is to introduce “impurities” with fixed spin, thereby breaking the symmetry, as shown in Figure 2.5. Alternatively, the model could be made dependent on the absolute values of the two spins, not just their relative values. This is most easily accomplished by adding a “magnetic field” to the model, which the spins will tend to align with. (This simply involves adding a term −B P i σi to the energy function given above; this is minimized when all the spins have the same sign as the constant B, representing the magnetic field.) In either case, we run into problems with the reduction to minimum cut: we have lost the spin-reversal symmetry and, as a result, rather than just being dependent on the cut itself, the energy is generally dependent on the interiors of the two regions. This yields a much harder problem. The only way we can see of preserving the good behaviour of the two-dimensional model is to introduce only one impurity (connected, via wires, to all the points at which impurities are required). This means that we can still view the problem as an instance of minimum cut, with the proviso that the choice as to which group of spins should be up is no longer arbitrary. From this, it might appear that we are able to construct arbitrary circuits, but we are left with one major difficulty: to do so, we would need to find a way for the “wires” to cross. In the case of Minesweeper, this was possible (completing the proof of NP-completeness), but here it is not. On the other hand, extending the model to three dimensions allows sufficient freedom for there to be no need for wires to cross. Similarly, allowing non-local interactions (even just across the diagonals of the grid squares) is enough to allow wires to cross. The net result of this is that it is possible to embed any conceivable logic circuit in a three-dimensional Ising model, or a two-dimensional model with non-local inter- Chapter 2. On the complexity of physical systems 41 actions, but only quite a restricted class of circuits in an ordinary two-dimensional model. It is also interesting to note that, as a consequence of this ability to simulate arbitrary logic circuits, simple neural networks can also be built into an Ising model, provided that we allow the spins to have arbitrary numbers of nearest neighbours. All we need do is designate one neighbour of a given spin as the “output” and the others as “inputs”: the spin – and the “output” – will then align itself with the majority of the “inputs”. The only difficulty with this scheme is that learning would require adjusting the strength of the bonds between the “inputs” and the “neuron”, which is not a physically viable process. (For further information about the link between Ising models and neural networks, see [78].) Leading on from this, it might appear that, by careful adjustment of the bond strengths in an Ising model, it is possible to arrange for the ground state of the model to be the solution to any given problem. However, for NP-complete problems, we cannot use the reduction to min cut as a means of finding the ground state, and we must rely on the normal evolution of the system. The problem with this is that, as we noted in the introduction, NP-complete problems generally have a number of different minima, and there is no reason to assume that the system will ever reach the global minimum. The dimensionality of the system rears its head again when we start to think about how we could go about ensuring that we do indeed find the ground state. The problem is that, as the number of dimensions increases, so does the number of nearest neighbours per spin. To see why this is a problem, consider the simplest possible example: the one-dimensional ferromagnetic model. Imagine that the system is already at the global minimum, with all spins aligned (upwards, say). Flipping one spin is energetically unfavourable, because it takes the two bonds around the spin from a total energy of −2J to +2J, an increase of 4J. However, at finite temperature, Boltzmann statistics imply that there is still a chance that it will happen, proportional to exp(−4J/T ). Once it has happened, the neighboring spins can flip without any change in energy and we Chapter 2. On the complexity of physical systems 42 can potentially get a “domino effect” (though this will proceed like a random walk, rather than a proper domino effect). What is more, as the size of the system increases, so does the chance of the initial flip happening. In fact, it can be shown that the behaviour of the system is ergodic, i.e. that, given enough time, it will eventually visit all possible states. By contrast, if we consider the analogous situation in a two-dimensional model, we find that the initial chance of a spin flip is much lower (due to an energy cost of 8J). In addition, the neighbouring spins still have three out of their four neighbours spinning up, so they will tend to stay aligned. It turns out that this suppression of change is enough to ensure that the two-dimensional model is no longer ergodic, and can settle into a definite state. In addition, this state need not be the ground state; all that is required is that each spin should have three or more nearest neighbours spinning in the same direction. Moving to three (or more) dimensions, we find more of the same: the energy cost of a spin flip is even higher (12J), and the neighbouring spins still have 5 out of 6 neighbours aligned. The condition for a state to be stable is also looser: 4 out of 6 neighbours, rather than 3 out of 4. This leads us to the seemingly paradoxical conclusion that systems with more dimensions simultaneously have more freedom (in that we can impose more complex structures on them) and are more resistant to change. This is why the three-dimensional zealot model took so long to settle, and bodes ill for the convergence of any system with similar dynamics. 2.5 Summary In this chapter, we have seen that there is a very good reason for the desire to force analogue systems into a digital straightjacket: the need to keep error and noise at bay. This makes us skeptical about the prospects of any physical system to improve on present computers in any significant way, if we keep to the Chapter 2. On the complexity of physical systems 43 paradigm of obliging the system to perform a predefined algorithm. On the other hand, physical systems open up the possibility of a more stochastic mode of computing, where the only requirement is that the solution be more stable than other points in the search space. We have even seen, thanks to the neon route finder, that this can work well for some problems. However, if we try to apply this approach to NP-complete problems, we find that the complicated search space embodied by such problems also makes this approach unlikely to converge exactly. On the other hand, physical systems can at least work very fast, being composed of a large number of basic elements. Given that the prospects for an efficient solution of such problems are remote, they may still be effective, at least in some cases. Chapter 3 Novel computing devices I wish to God these calculations had been executed by steam. Charles Babbage to William Herschel (1821) The history of computing is littered with all manner of weird and wonderful devices, all designed with one aim in mind: efficient calculation. After a long struggle, digital computers eventually emerged as the most flexible and reliable candidates; the aim of this chapter is to see why. The story begins, back in 18231 , with Charles Babbage and his “Difference Engine No. 1”, designed for the task of mechanically calculating mathematical tables (such as logarithms). Babbage had become frustrated with the number of mistakes he found in existing (hand-calculated) tables, and managed to convince the government that it could be done automatically. The task of building the machine2 was monumental – it called for 25,000 precision-machined parts – and it was never finished, but it led Babbage to an even more ambitious design: the “Analytical Engine”. This was never built either – it had 50,000 parts! – but it represented the first device that we would recognise as a general purpose computer: it would have been able to store information, and could have been 1 The story could begin with Napier’s design of the slide rule in 1650, or even with Leonardo da Vinci’s reputed design of the first ever calculating machine, but we are interested in programmable computers. 2 Incidentally, the machine was hand-cranked, not steam-powered. 44 Chapter 3. Novel computing devices 45 programmed using punched cards3 . Starting with the possibilities presented by purely mechanical devices like these, we will move on to their electrical counterparts (chiefly Shannon’s General Purpose Analog Computer), wave computers, DNA computers, and finally an “amorphous computer”. This gives a good cross-section of the possibilities, and we finish by tying these results in with the analysis of the previous chapter. 3.1 Mechanical computers The idea of computing using cogwheels and crankshafts may seem somewhat quaint now, but such machines provide a simple starting point in studying analog computers in general. For example, Vergis et al. [28] designed a purely mechanical device for solving linear constraint optimization problems, with a shaft for each of the variables and one for the objective function (i.e. the quantity to be optimized). Using gears (to perform multiplication) and differentials (for addition and subtraction), the positions of the shafts could be forced to obey the constraints. The simple act of turning the objective function shaft then turned the other shafts to values which gave the new function value. By turning the objective function shaft until it could turn no more, the machine could in principle find the optimum values for the variables. The biggest problem with this machine is readily apparent: it can only find local optima. By construction, it can only follow smooth paths through the parameter space of the problem, so there is no way out of a local optimum other than reducing the objective function again and trying to “steer round” the optimum. Worse, it is only possible to find the global optimum at all if there is a path from the machine’s starting point to the optimum which never leads “uphill” at any point; Vergis et al. call this the Downhill Principle. Clearly, no machine which obeys this principle can be of any general use in solving NP-complete problems, and it seems intuitively quite likely that all purely mechanical machines must do. 3 The first device to be “programmed” in this way was actually a loom, by Jacquard in 1820. Chapter 3. Novel computing devices 46 Essentially, this restricts such machines to implementing algorithms in the spirit of GSAT, with the attendant problems that brings. The other problem with machines such as these is that they must, like Babbage’s engines, be made to extremely high precision; otherwise, paths which are only weakly downhill may appear to actually lie uphill. In addition, using the machine described above on some problems can yield awkwardly high gear ratios or fine differentials. Thus, as the size of the problem increases, the required precision will also increase. As we have said before, local search through the parameter space of an NPcomplete problem is problematic at best, but to make it work at all we have to add some means of “kicking” the system out of local optima; for this, electrical computers hold out more promise. 3.2 Electrical analog computers The most famous analog computer was Shannon’s Gneral Purpose Analog Computer (GPAC), the origins of which can be dated all the way back to Lord Kelvin in 1876 [79]. This consisted of a small set of basic components, which could be connected together as desired to solve the problem at hand. These were: an adder (which outputs the sum of its inputs); an integrator (which, given inputs u and v calculates Rt t0 u dv, where t0 is given by device setting and t is the current time); a constant multiplier (which multiplies its input by a predefined constant); a multiplier (which takes two inputs, u and v and outputs uv); and a constant function (which has no inputs and continuously outputs 1). (The description given here is taken from [29].) The GPAC was originally designed to solve differential equations, and, despite its name, this is essentially all that it is capable of: Shannon [30] showed that the class of functions that it could generate was the set of solutions of a class of systems of quasi-linear differential equations. (This was made more precise by Chapter 3. Novel computing devices 47 Pour-El [31].) In particular, it is incapable of generating functions such as Euler’s Γ function [33], so it is less powerful than a Turing machine. One feature of the function class which a GPAC can generate that is of immediate interest to us is that it must have a domain of generation, i.e. there must be some (finite) region around the initial conditions within which the solution does not change. This effectively means that the problem must allow room for imprecision in the initial conditions and later calculation. As such, we are again likely to be restricted to polynomial problems, unless we are happy to live with exponential accuracy requirements. As an aside, let us hark back to the real valued computers of BSS, which were able to compress the memory requirements of algorithms for real-valued problems as compared to their implementation on a digital computer. We have argued that this would not work in practice, but such computers are in principle able to provide an increase in power. It is interesting to note, for comparison, that the GPAC does not: it is possible to simulate its operation with only a polynomial slowdown [80]. Recent work [32] has shown that a relatively simple extension of the GPAC gives it substantially more power. The augmented GPAC includes a box which calculates xk θ(x) for input x and fixed k, where θ(x) is the Heaviside step function (i.e. θ(x) = 1 for x > 0 and θ(x) = 0 otherwise). This gives the GPAC the ability to “sense” inequalities in a differentiable way, getting round Siegelmann’s third requirement. Of course, the proce we pay for this is that values of x which are only slightly above zero will only be sensed very weakly, effectively imposing a lower limit on the size of the inequalities that can be sensed. Nevertheless, this allows the enhanced GPAC to compute most functions, provided only that they are computable by a Turing machine in a time bounded by a primitive recursive function of the problem size. The class of primitive recursive functions is large, and includes the exponential function, so the extended GPAC can at least compute the solution to any NP-complete problem. This result does not, however, say anything about how long it will take! In addition, as with the Mandelbrot set problem, we are likely to have to scale the quantities used, so 48 Chapter 3. Novel computing devices that inequalities can be sensed and the domain of generation is large enough. Thus, even if the time required turned out to be acceptable, the accuracy and energy requirements would not be. 3.3 Artificial neural networks Artificial neural networks are simplified models of the brain, consisting of a large number of artificial “neurons”. These are either fully connected together, or they are formed into densely connected groups which are only sparsely linked. (This latter option is intended to mirror the brain’s organisation of neurons into functional groups.) Although real neurons are biologically quite complex, their basic operation is thought to actually be very simple. The first artificial neuron was designed by McCullough and Pitts, and its operation is summarised in Figure 3.1. I1 w1 I2 w2 w3 I3 Neuron O w4 I4 Figure 3.1: Basic model of a neuron. The output O is 1 if otherwise, where the wi are the weights and T is the threshold. P i wi Ii > T and 0 Other designs have thresholds which are less sharp – a sigmoid function, for example, which is biologically better motivated – but their behaviour is similar. A good way to think of a neuron is as a detector: if it “detects” enough input, then it gives an output; otherwise it does nothing. Real neurons give out a series of spikes, but most artificial models use a “rate approximation”, i.e. their inputs Chapter 3. Novel computing devices 49 and outputs are continuous-valued spiking rates. This is thought [81] to make no fundamemtal difference to their properties. Another crucial feature of neurons is that their output can either be ‘excitatory’ or ‘inhibitory’ (i.e. positive or negative), with each neuron only providing one type of output. As we will see, it is only through having both types of neuron that the brain can be as powerful as it is. Viewed from a computational angle, a neuron can be thought of as a general (and adaptable) type of logic gate. A basic, but important, result about logic gates is that only a few different types are needed in order to perform any possible computation. In fact, it is possible to use just one type of gate: the NAND gate. This has two inputs, and gives an output unless both inputs are on. For any given set of inputs and outputs, it is possible to build a network of NAND gates that, given the inputs, produces the outputs. In other words, any possible boolean function can be represented with NAND gates. Following this line of thought, we rapidly hit a difficulty in thinking of brains as flexible computers: it doesn’t look as if one neuron could possibly represent a NAND gate. As presented, more input means more output, not less! A single neuron with suitable weights can represent some types of gate, such as an OR gate or an AND gate, but ‘negative’ gates like NAND are a problem. This is where inhibition comes in: by connecting together inhibitory and excitatory neurons, we can do much more. Figure 3.2 shows a pair of neurons, one excitatory (e) and one inhibitory (i), which, working together, can solve problems like these. If, for example, we arrange for neuron i to be active only when both the inputs are (i.e. be an AND gate) and use the inhibitory output to suppress neuron e, then we can replicate another type of logic gate, XOR (which is on only if one or other input is on, but not both). The inhibitory neuron has given us the last piece of the puzzle: how to get less output from more input. Replicating a NAND gate is even easier than this: rather than connecting the inputs to neuron e, we 50 Chapter 3. Novel computing devices Neuron i w11 I1 I2 w21 w22 wi w12 Neuron e O Figure 3.2: Co-operative pair of neurons. just have to connect e to a source which is always on, making it active unless it is suppressed by neuron i. Much of the interest in artificial neural networks as practical computer systems (rather than toy models of the brain) stems from the fact that they do not have to be programmed in the conventional sense; they can “learn from experience” as we do. Two main mechanisms have been proposed for this: Hebbian learning and error-driven learning. Hebbian learning (first proposed by Donald Hebb in 1949) works to emphasise the correlations between neurons, by strengthening the weights between neurons that are often active at the same time. This is a bit like making the connection between a lightning flash and a thunder clap: because they tend to occur together, we come to feel that there must be a connection between them. Eventually, we start to expect the thunder when we see the lightning. Similarly, if it so happens that a neuron is active more often than not when we see a cat, then Hebbian learning will act to press the neuron into service as a ‘cat detector’. (This is not as crazy as it may sound: the brain has many highly specialised neurons, some of which act as ‘face detectors’ for people we know well.) Error-driven learning, by contrast, makes the brain learn by trial and error. When we do something right, then the weights between the neurons that were active in making the decision are strengthened; otherwise, they are weakened. (The Chapter 3. Novel computing devices 51 mechanism for doing this – the release of a neurotransmitter called dopamine – is also what makes us feel good when we succeed.) In other words, we remember good strategies and forget bad ones. These two strategies work well together, in that error-driven learning drives us towards good solutions to problems, while Hebbian learning drives us towards solutions that represent the essential relationships between their parts. As a result, our solutions to problems come to resemble a series of logical steps, making it much easier to generalise from situations we know about to situations we’ve never experienced before. As a consequence of their ability to simulate Turing machines, neural networks are at least as powerful. In fact, Siegelmann has argued [82] that the class of problems that they can handle is actually P/poly. This is the class of problems which can be solved by a Turing machine if its size is allowed to be a polynomial function of the problem size. The reason for this is the same as that which gives the BSS real-valued computer more power: the weights are real-valued, so neurons can be trained to “detect” any real number. As with the BSS machine, this power vanishes as soon as we bring real-world considerations into play. Their ability to learn gives neural networks an important advantage over the other systems we have considered: they are able to stochastically learn “heuristics” and then apply them to later problem examples. In the worst case, this cannot help – these are no regularities to be exploited – but it is of considerable practical use. For NP-complete problems, however, we conjecture that their performance will reduce to stochastic search, though of a sophisticated variety; more like WalkSAT than GSAT, perhaps. 3.4 Circuit-based “consistency computers” The electrical circuit version of the neon route finder inspired us to try to design circuits which could solve NP-complete problems, ideally without the route Chapter 3. Novel computing devices 52 finder’s rider that the answer could not be guaranteed optimal. The overriding concern was that the optimal solutions should be stable, and give rise to the greatest current flow. As with all such stochastic methods, the odds of the optimal solution actually being found are not high, but the rate of search could potentially be very fast. These circuits borrow from artificial neural networks, in that they feature “neuronlike” elements. These are not full-blown artificial neurons, in that their purpose is to permit current to flow unless an inhibitory input is active; they could be implemented using transistors. The remainder of this section considers potential implementations of this idea for the solution of three different NP-complete problems: the travelling salesman problem, the satisfiability problem, and integer factorization. 3.4.1 The travelling salesman problem The travelling salesman problem requires us to deal with two, potentially conflicting, requirements. On one hand, the route found must be as short as possible while, on the other hand, visiting as many cities as possible (subject to the constraint that each city be visited no more than once). To deal with these two requirements, together with the constraints, we propose the model shown in Figure 3.3. First, the roads connecting the cities are modelled by wires, and the cities themselves by small artificial neural networks. The resistance of each wire – not surprisingly – is proportional to the length of the road it is modelling. (To make the model more useful, a variable resistor could be added to each wire and put under the control of an external computer. This would allow the model to be adapted to solve any travelling salesman problem with the same number – or fewer – of cities.) To “persuade” the current in the model to flow through as many links as possible, each wire also has a small potential difference added to it. If the potential 53 Chapter 3. Novel computing devices C4 C3 C1 C2 Figure 3.3: Travelling salesman model. The lines represent wires, with the arrows representing the direction in which the potential difference across that wire encourages the current to flow. The cities – C1 to C4 – are small artificial neural networks which have the link wires as inputs and outputs. difference is the same across each link, and is of a suitable size, the idea is that the current will simultaneously aim for as many links as possible and as low a resistance as possible. The neural network at each city node simply enforces the condition that each city can only be visited at most once by requiring that only one input and one output can be active at any one time. A possible network is shown in Figure 3.4. R I1 O1 I2 O2 I3 O3 I4 O4 Figure 3.4: Travelling salesman city node network. 54 Chapter 3. Novel computing devices The idea behind this network is that the regulatory neuron R serves to inhibit the activity of the network when it rises above threshold (and is is set at a suitable level to allow just two of the other neurons to be active). Given that a current is flowing through the network, this gives the network no choice but to activate one input neuron and one output neuron, which is exactly what we want. 3.4.2 The satisfiability problem The basic structure of our model for solving this problem is as shown in Figure 3.5. X11 X21 X31 X41 X51 X61 X71 X12 X22 X32 X42 X52 X62 X72 X1n X2n X3n X4n X5n X6n X7n Figure 3.5: Basic structure of the satisfaction problem network, for a seven-variable problem The idea behind this model is that each parallel section represents one disjunctive subformula, and that, by putting these in series, we can represent the whole formula to be satisfied. If we then apply a potential difference from top to bottom, Chapter 3. Novel computing devices 55 the aim is to persuade the current to flow only through paths that satisfy the formula. When a particular variable appears un-negated and can be set to true in satisfying the formula (or appears negated and can be set to false), we represent this by allowing current to flow through the appropriate branch. Conversely, when the variable appears un-negated and can be set to false (or appears negated and can be set to true), we represent this by blocking the flow of current in the appropriate branch. To enforce these conditions, we need to add two regulatory neurons per variable, one connected to all the paths where that variable appears negated, and the other connected to all the paths where the variable appears un-negated. Each regulatory neuron is then used to inhibit all the paths that it does not have as inputs. In this way, current can either flow through the paths where a particular variable appears negated (meaning that the variable is false in the solution) or through the paths where it appears un-negated (meaning that the variable is true in the solution), but not both. As with the travelling salesman problem, note that adding switches between the regulatory neurons and the paths should allow this model to be flexible enough to encode any satsfaction problem up to the size of the model. 3.4.3 Integer factorization This section will be very short, as this idea is yet to be worked out in detail. However, the idea behind it is motivated by an MSc project proposal by Kousha Etessami [83]. This proposal points out that any factorization problem can be restated as a satisfiability problem (which should then be solvable by the model of the previous section). In more general terms, we could think of a suitable type of multiplier circuit, and simply connect the outputs to the inputs. By using the output bits that are zero in the number to be factorized to inhibit the other bits (and vice versa) and adding “incentive” potential differences across the output bits that should be Chapter 3. Novel computing devices 56 one, we can potentially persuade the circuit to settle into a state where the inputs become factors of the desired number. However, if this is to work effectively, it will require the properties of the multiplier circuit to be carefully considered. 3.4.4 Conclusions The above models are all designed around the idea of consistency: connect the output of a problem to the input in such a way that only consistent solutions are stable. This is a very general idea, which can be applied to any problem in NP, but in the end it is just a stochastic search process. It is interesting to note that, with their “neural” features, at least the last two of these examples could also be implemented as artificial neural networks. There would be some difficulty in giving the travelling salesman system a preference for shortest routes rather than just Hamiltonian cycles, but the other two examples would not need to be altered. The satisfiability problem network would then clearly implement a stochastic search-and-backtrack algorithm; as with so many of our examples, its main virtue would be speed rather than power. A final point that surprised us was that the initial motivation – using electricity to “find the path of least resistance” – turned out not to be of any great help. In particular, we naı̈vely hoped that this would give this type of system an advantage over artificial neural networks. However, in the end, the fact that NP-complete problems do not have any useful notion of “downhill” makes any system which tries to exploit it problematic. 3.5 DNA computers Just as DNA is able to encode all the information required to construct a living organism (be it a bacterium, a tree or a human being), it should also be possible to use it to encode partial or complete solutions to computational problems. If two strands can also be combined in such a way that the information in the Chapter 3. Novel computing devices 57 combined strand represents a given logical operation applied to the information in the original strands, then we have the basis for a computational device. This idea was first proposed by Leonard Adleman [11] in the context of the Hamiltonian path problem. Given a directed graph, a start vertex and an end vertex, the problem is to determine whether or not there is a path between the two vertices which passes through all the other vertices exactly once. Adleman’s solution was to represent each directed edge by a strand of DNA, with characteristic sequences at either end to represent the vertices. Two strands of DNA can only combine if the subsequences which bind to each other are complementary. The details are not relevant here, but suffice it to say that the characteristic sequences were designed in such a way that the sequence for a given vertex at the beginning of a strand and the sequence for the same vertex at the end of a strand formed a complementary pair (and that no other combinations were complementary). Thus a strand representing a → b could combine with one representing b → c to give one representing a → b → c. To solve the problem, Adleman prepared a solution containing large quantities of all the “edge” strands, and then allowed them to freely combine, producing a mixture of strands representing a large number of random paths through the graph. He then used chemical means to filter out all the strands except those that began and ended in the right places, had the appropriate length and contained the characteristic sequences representing all the vertices. This left only those strands representing solutions to the problem, which could then be sequenced and thus read out. Later work has shown that it may be possible to use this method to perform any arbitrary logic operation, and hence that universal computation may, in principle, be possible [14]. However, applying the method to combinatorial problems relies on there being sufficiently many strands to make it likely that all possible combinations are represented in detectable numbers. In particular, if the filtration process eliminates all the strands, we must be able to take it as evidence that a solution does not exist, rather than simply that the strands representing the Chapter 3. Novel computing devices 58 solution did not happen to be created. This is a serious problem, since the total number of strands must typically be within a few orders of magnitude of Avogadro’s number, 6 × 1023 . If we were to apply this method to a travelling salesman-type problem, we reach this number of possibilities with only 25 cities, considering only tours that visit all cities exactly once. With this method, the initial generation phase will also generate many tours which do not meet even this criterion; in fact, we would expect the majority of strands to fail. As a result, applying the method to even 25 cities might be somewhat over-optimistic. It should be noted that we are not saying that there is nothing of value in a DNAbased approach; it is able to provide massive parallelism, and this potentially outweighs the cost of the rather laborious process computation entails. What we are saying is that it can only be of incremental benefit in solving combinatorial problems. 3.6 Amorphous computers Unlike the other examples in this chapter, amorphous computers are built from ordinary computer hardware. (For further details, see [86].) The key idea behind them is to, rather than having one big computer, have many small ones, connected via radio-frequency links. Each computer can do very little on its own, and the links are deliberately weak, to ensure that they can only communicate with others in their local neighbourhood. The driving image is of ants in a colony, cells in a body or neurons in a brain. The benefits of such systems are that the elements are cheap to produce, and can be added or removed easily. In fact, the algorithms they run are intended to mimic the behaviour of their biological counterparts in being insensitive to the number of elements or their connections; order should spontaneously emerge from the chaos. As a result, they should be intrinsically fault-tolerant, and capable Chapter 3. Novel computing devices 59 of many of the same feats as the human brain. (In fact, amorphous computers could easily simulate artificial neural networks.) In light of the previous analysis, this approach holds nothing really new, though it may well be of practical use. 3.7 Summary In the end, all the physical systems we have considered boil down either to stochastic local search, or a more flexible implementation of ordinary computer circuits. Potentially, this can be done very fast, and adaptability is practically very useful, but none of them give us anything fundamentally new. This is in sharp distinction to quantum systems, which we will discuss in the next chapter. Chapter 4 Quantum computers Quantum computing will not replace classical computing for similar reasons that quantum physics does not replace classical physics: no one ever consulted Heisenberg in order to design a house, and no one takes their car to be mended by a quantum mechanic. Andrew Steane, quoted from [22] As computers grow ever smaller (and ever faster) quantum mechanical effects become increasingly more important, even for conventional computers. The strange effects this introduces (such as electrons potentially tunneling their way into different circuits) might appear to be more of a hindrance than a help, something to be fought against. However, this is not the case, as Feynman [23] and Benioff [24] independently realised in 1982. Feynman’s reasoning is particularly relevant here: he realised that there are some quantum systems which could not be simulated by a classical computer without an exponential slowdown. This is in direct contradiction with our conclusion (see Chapter 2) that physical systems can only cause polynomial slowdowns, and throws the Church-Turing thesis into doubt. This revelation led Deutsch [25] to the specification of a universal quantum computer, the quantum equivalent of a Turing machine. This then stimulated an avalanche of research into the possibilities of computers based on quantum mechanical principles, culminating in the discovery that, for some problems, it was indeed possible to gain an exponential speedup. Initially, the problems that were 60 Chapter 4. Quantum computers 61 tackled were quite esoteric, but in 1996 Shor [4] sent shockwaves through the community by showing how to factor large numbers in polynomial time. Since then, however, little progress has been made, and the repertoire of problems for which useful quantum algorithms have been found remains small. This has led many – such as Steane, above – to conclude that quantum computers will only be of limited use. In this chapter, we first introduce the basic ideas of quantum mechanics, before going on to discuss the progress that has been made in designing powerful quantum algorithms. This will lead us to an assessment of the benefits and problems involved in putting quantum mechanics to use in computation, and on to the seat of its power. There is hardly room in this thesis for us to do more than provide the most cursory overview of the subject, highlighting the points we need. For a more detailed introduction to quantum mechanics, see e.g. [26], or, for one with a computing focus, see [27]. 4.1 Quantum mechanics, a brief introduction Quantum mechanics is, at its heart, mathematically quite simple. Even though its consequences are strange and counterintuitive, the story begins with very little more than an ordinary classical wave equation. The reason why quantum mechanics appears strange is that this equation applies not just to waves, but also to particles: the celebrated wave-particle duality. However, once we come to terms with this, there is really little more about the theory that is different to ordinary classical theory. 4.1.1 Wave-particle duality We are all familiar with the idea that waves can be made up of particles; after all, water waves are in fact composed of discrete water molecules. Provided that Chapter 4. Quantum computers 62 the number of particles involved is high enough, we can define a “wave function” in terms of the expected number of particles at any given point which will be a good approximate representation of the system’s overall behaviour. The only real difference is that quantum mechanics postulates that this will continue to be a good representation even when the number of particles drops drastically. In fact, it states that it is a good description even of a single particle. In other words, the probability distribution that we previously viewed as a good tool has to be accorded some sort of physical reality. The energy of a simple harmonic oscillator (such as electromagnetic radiation) is proportional to the square of its amplitude and inversely proportional to its wavelength. Accordingly, the appropriate wave function to use here is one where the square of the amplitude gives a probability (as this scales naturally into an expected number of particles and hence an expected energy). Conversely, in giving a particle-like interpretation to electromagnetic waves, it is natural to assume that the energy of the resulting “wave packets” should be inversely proportional to their wavelength. This results in the Planck hypothesis E = hν for the energy of photons, where h is Planck’s constant. In short, quantum mechanics describes everything in terms of particles with an associated probability wave function; for high particle density (e.g. electromagnetic radiation), this yields a classical wave theory. As a result, we extend the Planck hypothesis to all particles, not just photons. One further result that we shall need, due to de Broglie, comes from reconciling this with Einstein’s special relativity. Knowing that E = mc2 , and also that c = νλ for photons, we find hν = hc = mc2 = (mc)c = pc, λ where p is the momentum of the photon. From this, we have p = h/λ. De Broglie hypothesised that, since we are viewing everything in the same terms, this ought to apply to material particles as well. 63 Chapter 4. Quantum computers 4.1.2 The Schrödinger wave equation The Schrödinger wave equation is simply a variation on a classical wave equation, used to impose energy conservation on the waves. For example, the classical equation for waves propagating along a string is 2 ∂ 2 Ψ(x, t) 2 ∂ Ψ(x, t) = v , ∂t2 ∂x2 where Ψ(x, t) represents the displacement of the string at position x and time t, and v is the speed of the wave. This is derived as a consequence of Newton’s F = ma, as applied to elements of the string (i.e. relating the acceleration of a string segment to the net force on it due to the rest of the string). The most general solutions to this equation are plane waves with amplitude A, wavelength λ and frequency ν, usually written in the form Ψ(x, t) = A cos(kx − ωt), where ω = 2πν and k = 2π/λ. In terms of these waves, the wave equation can then be seen as imposing the condition v 2 k 2 = ω 2 on the waves, which can be rewritten in the more usual form v = νλ. Seen from this perspective, the Schrödinger wave equation – the cornerstone of quantum mechanics – simply imposes conservation of energy on a very similar sort of wavefunction. For waves of the form given above, we find −∂ 2 Ψ(x, t) = k 2 Ψ(x, t). ∂x2 Using de Broglie’s relation, we find k 2 = (2π)2 /λ2 = (2π)2 p2 /h2 . The kinetic energy of a particle of mass m can be written as 1/2mv 2 = (mv)2 /2m = p2 /2m, so we can write à h − 2π !2 1 ∂ 2 Ψ(x, t) = KΨ(x, t), 2m ∂x2 where K is the kinetic energy of the “particle” associated with the wave. Using V (x) to represent the potential energy of the particle at point x, we then have à h − 2π !2 1 ∂ 2 Ψ(x, t) + V (x)Ψ(x, t) = EΨ(x, t), 2m ∂x2 Chapter 4. Quantum computers 64 where E is the total energy of the particle. By treating E as being fixed, this can be seen as imposing conservation of energy; it also happens to be Schrödinger’s wave equation, in its time-independent form. Up to this point, we have been cheating slightly: the wavefunctions we have been considering have been real-valued quantities, whereas true quantum-mechanical wavefunctions are complex-valued. While this is important, the same general phenomena are present in both cases. (The idea of a complex-valued wave is not, in fact, intrinsically quantum mechanical: in classical electromagnetism, electromagnetic waves are also complex.) Let us now, however, correct this solecism and consider the complex wave Ψ(x, t) = A cos(kx − ωt) + iA sin(kx − ωt) = A exp(i[kx − ωt]). The same arguments still apply, but now we also have ih −∂Ψ(x, t) = hωΨ(x, t) = 2πhνΨ(x, t). ∂t Recalling the Planck hypothesis, E = hν, this allows us to rewrite the Schrödinger equation as à h − 2π !2 1 ∂ 2 Ψ(x, t) ih ∂Ψ(x, t) + V (x)Ψ(x, t) = , 2 2m ∂x 2π ∂t which is its full, time-dependent form. 4.1.3 Quantum states As with all wave theories, it is natural to think of decomposing the wavefunction, writing it as a sum of terms in a particular basis. With a sound wave, the natural basis set is the set of all sine waves, and this allows us to picture a general sound wave as being made up of pure tones of different frequencies, all playing together. In quantum mechanics, the natural basis set comprises the possible states of the 65 Chapter 4. Quantum computers particles involved in the system. Using Dirac’s notation, these are usually written as |Ψi, so we could say Ψ = a|Ψ1 i + b|Ψ2 i if the system had only two possible states, Ψ1 and Ψ2 . Note that |Ψ1 i and |Ψ2 i are just shorthand for particular quantum waves, but wave-particle duality also implies that they represent particular states of the system. As mentioned above, we obtain a probability distribution from the square of the wave function. Given that the wave function is complex , however, this actually means multiplying it by its complex conjugate; for the plane wave above, Ψ∗ Ψ = (A exp(i[kx − ωt]))(A exp(−i[kx − ωt])) = A2 , where Ψ∗ represents the complex conjugate of Ψ. States also have complex conjugates, denoted by hΨ|, so we could say Ψ∗ Ψ = (a|Ψ1 i + b|Ψ2 i)(ahΨ1 | + bhΨ2 |) = a2 hΨ1 ||Ψ2 i + b2 hΨ1 ||Ψ2 i + ab(hΨ1 ||Ψ2 i + hΨ1 ||Ψ2 i). Sets of basis states are almost always chosen to be orthonormal , meaning that hΨi ||Ψj i is 1 if i = j and zero otherwise. This reduces the above to just a2 + b2 . Since we are implicitly considering all of space, and the total probability of the system being somewhere is one, we must have a2 + b2 = 1. (In what follows, however, we will at times neglect such normalisation factors, in the interests of clarity.) More generally, hΨ1 ||Ψ2 i is the probability that system is both in the state |Ψ1 i and the state |Ψ2 i. The total energy of a quantum system, is often called the Hamiltonian, H, by analogy with classical mechanics, and the Schrödinger equation written as ∂ Ψ = −iHΨ. ∂t At first, it appears that we have thrown something away by doing this, but it encapsulates everything that we need for thinking about quantum computers. 66 Chapter 4. Quantum computers Once we have determined a set of basis states, we can represent the Hamiltonian as a matrix, operating on vectors in the space of basis states, e.g. H(t) = i(t) j(t) k(t) l(t) , and hence to write i(t) j(t) k(t) l(t) a|Ψ1 i b|Ψ2 i = (ai(t) + bj(t))|Ψ1 i + (ak(t) + bl(t))|Ψ2 i. The only fundamental requirement on the Hamiltonian is that it should be selfadjoint, i.e. that H† = (HT )∗ = H, where HT is the transpose of H. The reason for this can be seen if we try to interpret hΨ2 |H|Ψ1 i. This can be described as the probability that operating on |Ψ1 i with H will yield the state |Ψ2 i. The problem is that it can also be described as the probability that operating on hΨ2 | with H will yield the state hΨ1 |. Given that hΨ2 | is the complex conjugate of |Ψ2 i and that we are now operating on the right rather than the left, consistency requires that H be self-adjoint. If we evolve our quantum system forward in time by a small amount ∆t, the Schrödinger equation gives its state approximately as Ψ(t + ∆t) = (I − iH∆t)Ψ(t) = U(∆t)Ψ(t), where we have introduced the time evolution operator U. In the limit as ∆t → 0, U is unitary, i.e. U † U = 1; this ensures that (U(t)Ψ)∗ (U(t)Ψ) = Ψ∗ U† (t)U(t)Ψ = Ψ∗ Ψ = 1, i.e. that the wavefunction has a consistent interpretation over time as a probability amplitude. If the Hamiltonian does not change with time, we can write U(t) = exp(itH) for macroscopic times; this is the usual situation in quantum computers. 67 Chapter 4. Quantum computers 4.1.4 Operators and Measurement With each observable quantity (such as position and momentum), we associate a quantum operator. Thinking in terms of the vector of basis states introduced above, we can think of the operator as a matrix. The values that the quantity can take on are given by the eigenvalues of the operator. Thus, for example, if we have an operator A and we find that 1 AΨ = Ψ 2 then we say that, for the wavefunction Ψ, the quantity associated with the operator A has the definite value 21 . Eigenstates of a given operator have a definite value for the associated quantity and are undisturbed by the act of measuring that quantity. Otherwise, the act of measurement causes the wavefunction to be projected down onto one of the eigenstates of the operator. This may seem odd, as the set of eigenstates forms a basis set, in terms of which any wavefunction may be expressed, so we would expect a sum of states to be the result; this is a central problem in quantum mechanics, of which more later. When measurement does disturb the wavefunction, measurement will not always yield the same value. In this case, we can only define an average, or expectation value for the operator, given by hAi = hΨ|A|Ψi. From the above, this represents the probability that the wavefunction will emerge from the operation unchanged. Although operators all have an associated set of basis states, the sets are not necessarily the same. When two operators, say A and B, do share a set of basis states, this means that the effects of applying AB and BA are the same; in other words, they commute. In consequence, the values for a set of quantities which are associated with operators which form a commuting set can all be measured without disturbing the state; we call this a set of observables. If we attempt to measure quantities associated with operators which do not commute, the second Chapter 4. Quantum computers 68 measurement will disturb the state found by the first, meaning that the first measurement will no longer be true of the current state. 4.1.5 Quantisation and spin Many quantities in quantum mechanical systems turn out to take only a discrete set of values. For example, electrons can only occupy a discrete set of orbits round an atom. This means that there are only a finite set of possible energy “jumps” that electrons can make, so atoms can only absorb or emit radiation (i.e. photons) with particular energies, giving rise to a characteristic set of spectral absorption/emission lines. The reason for this is that the electron must now be thought of as being “smeared out” round its orbit. Since the potential energy at all points of a circular orbit is the same, the resulting wavefunction looks like a plane wave bent round into a circle. As a result, demanding that the wavefunction is smooth and continuous means that the only allowable orbits are those whose length is an integer multiple of the electron’s wavelength at that energy. As well as orbital angular momentum, electrons also have an intrinsic angular momentum, called spin; it is convenient – though technically wrong – to think of this as being due to the electron spinning on its axis. Like its orbital angular momentum – and for similar reasons – this too is quantised, taking values of either h/4π or −h/4π, which we will refer to as “spin up” and “spin down”. (We will denote these by | ↑i and | ↓i respectively.) In fact, all particles have spin, and it is always some multiple of h/2π; as a result, electrons are said to be “spin- 21 ”. All common material particles as spin- 21 , and even the more exotic types have a spin which is an odd multiple of 12 . By contrast, all massless particles (such as photons) have integral spin; the photon is “spin-1”. With spin, we assciate three operators, one for each orthogonal direction: S x , Sy and Sz . It is usual, in considering problems involving spin, to use the eigenstates 69 Chapter 4. Quantum computers of Sz as a basis, giving us h 1 0 Sz = 4π 0 −1 for the spin of an electron. We also find that h 0 1 , Sx = 4π 1 0 ih 0 −1 Sy = . 4π 1 0 (These three matrices, without their factors of h/4π, are known as the Pauli spin matrices.) The important point to note about Sx and Sy is that, if the electron is in an eigenstate of Sz , then both their expectation values are zero. In other words, if the electron is definitely spin up with respect to the z-axis, the results of a measurement along either of the other axes will be exactly even. 4.1.6 Quantum parallelism, entanglement and interference An important feature of wave equations, classical and quantum alike, is that they are linear , meaning that, if Ψ1 and Ψ2 are both solutions, then so is Ψ1 + Ψ2 . This means that, in principle, arbitrarily many solutions can be superposed (i.e. added together), manipulated simultaneously, and then the results “read out”, by Fourier decomposition or otherwise. The problem with this is that, since the wavefunction of a system represents the probability of finding it in a given state, the accuracy of the results depends on how accurately the observed particle counts match the underlying wave function. Thus, the greater the superposition, the greater the number of particles required for this method to give an accurate result. That said, however, careful application of this general principle is at the heart of all known powerful quantum algorithms. There is an important difference between superpositions in quantum theory and those in classical theories: quantum superpositions can be entangled . In a classical wave system, the waves are the most fundamental objects; in quantum 70 Chapter 4. Quantum computers mechanics, they can represent the joint state of a (possibly large) number of particles. This means that we can have wavefunctions such as Ψ = a| ↓↓i + b| ↓↑i + c| ↑↓i + d| ↑↑i for the joint state of a pair of spins. The reason why states such as this are called entangled is because measurement of one of the particles affects the probability of finding the other in a particular state. For example, measuring the first particle as being spin down means that the probability that a measurement of the second particle would also yield spin down is a2 ; measuring the first particle as being spin up changes this to c2 . In other words, operations carried out on one particle have an effect on all the particles. This does not apply just to measurements, but to any operation applied to a subset of the particles. For example, a commonly-used operation is the Walsh-Hadamard transformation 1 1 1 , H= √ 2 1 −1 √ which transforms the state | ↑i into 1/ 2(| ↑i + | ↓i). If this is applied to the first √ particle of a pair in the state | ↑↑i, it yields 1/ 2(| ↑↑i + | ↓↑i). This means that, for example, applying it successively to each particle in an N -particle system yields a superposition of all 2N possible states. This leads us on to the other important aspect of entanglement: that it exponentially increases the size of the space of states. For N particles, we can have a superposition of 2N terms where, by contrast, N waves allow a superposition of only N terms. The reason for this increase is that each particle in the quantum system is allowed to, in a sense, be in several states simultaneously; with several particles, this leads to a combinatorial explosion. The other important feature of quantum systems, which is exploited by a number of quantum algorithms, is interference. As with classical waves, quantum waves which are divided and then recombined can interfere with each other. The canonical example of this is the two-slit experiment, which can be carried out equally well with water waves, laser beams or electron beams. Chapter 4. Quantum computers 4.1.7 71 Heisenberg’s uncertainty relations A further consequence of wave picture at the root of quantum mechanics is that, rather than talking about absolute positions and momenta, we must talk about distributions: the wavefunction we have been talking about so far can be written as an integral – in fact, a Fourier transform – over the particle’s momentum distribution. Similarly, we could have chosen to define the wavefunction in momentum space rather than coordinate space, in which case it would have been a Fourier transform over the particle’s space probability distribution. A consequence of these two distributions being Fourier transforms of each other is that compressing one distribution causes the other to spread out: the more precisely the particle is confined in space, the greater the uncertainty in its momentum, and vice versa. As a result, there is a minimum limit to the combined uncertainty in position and momentum, given by ∆x∆p ≥ h/2π – one of Heisenberg’s uncertainty relations. Another way of looking at this result is as implica- tion of the fact that the operators for position and momentum do not commute; measuring one disturbs a measurement of the other. In fact, Heisenberg’s relations apply to any pair of “complementary” variables (i.e. variables which can be defined as Fourier transforms of each other), and we can write down similar relations for energy and time, for example. It is perhaps worth reiterating here that this is, again, a general feature of wave theories; a sharp kink in a string also leads to waves with a wide variety of different wavelengths, and the only way to restrict the system to waves of one wavelength is to start from a displacement distribution spread over the whole string. The only reason that this seems strange is because what we measure are particles, not distributions. 4.1.8 The problem of measurement As is hopefully becoming clear, the strangeness of quantum mechanics has little to do with its mathematical formalism – which is that of an ordianary wave theory, Chapter 4. Quantum computers 72 though admittedly applied to complex-valued wavefunctions – and everything to do with the fact that what we actually observe are particles, with a definite position and momentum. This is difficult to reconcile with the rest of the theory, and has been the topic of much debate over the years; there is still no definitive answer. In the early days of quantum theory, the dominant interpretation was that of Niels Bohr, now called the Copenhagen interpretation. This stated that, on measuring, say, the position of a particle the distribution “collapses” to reflect a definite position, with a probability given by the square of the wavefunction at that point. (In the process, this also introduces a massive uncertainty in the particle’s momentum.) Mathematically, we would expect that the operator representing the measurement should yield a superposition of states; the Copenhagen interpretation states that all but one term collapses. This leads to the correct numerical results, but suffers from being unmotivated by anything else in the theory. The Copenhagen interpretation has also led many to assume that there is something special about conscious observers, and to postulate various theories about what constitutes a conscious observer, and whether other conscious observers can be said to be in a superposition relative to ourselves before we observe them. To be fair to Bohr, this is not what he meant: any process by which information about a particle in superposition “leaks out” into the rest of the world counts as an observation. For example, air molecules that come near enough to be affected by it count as observers, because we could then – in principle at least – make measurements on them to determine the state of the particle, without having to do anything to the particle itself. Consciousness has nothing to do with it: we are simply one means by which information can be recorded and transmitted. The other commonly-held view is the “many worlds” interpretation, first proposed by Everett. In this view, the wave function does not “collapse”; measurement does indeed yield a superposition, and simply entwines the observer with the observed. Thus, observing a particle in superposition results in another superposition, each term representing the particle being observed to be in a different position. The Chapter 4. Quantum computers 73 terms are almost certain to evolve separately from then on, meaning that each term can be thought of as a separate “world”, independent of all the others. The many worlds interpretation leads to essentially the same consequences in terms of what we observe: all the other worlds are independent, so there is no way to tell whether they are there, or whether the wave function collapsed. The consequences are not exactly the same, because there is a very short period after an observation when the terms in the superposition are still sufficiently similar to each other for interference to be possible. In practice, however, this period is so short (about 10−20 s) that no experiments have yet been able to detect it. This interpretation resolves the problems of the Copenhagen interpretation, in that it adds essentially nothing to the theory, but it does paint a very strange picture. On a personal note, we feel that mere strangeness is no reason to dismiss a theory, and tend to prefer it as being the more natural of the two in other ways. However, it is largely a question of taste at this stage: the lack of any practical differences in their predictions make them equally valid. 4.1.9 Quantum computers The quantum analogue of the bit is the qubit. In ordinary computers, bits are represented by physical systems which can be in one of two states; the first computers used vacuum tubes, modern ones use tiny capacitors. Quantum computers typically use electron spin (representing 1 by “spin up” and 0 by “spin down”, for example), or the polarization of a photon (left circularly polarised for 1, right circularly polarised for 0, for example). Just as with bits, these can take on two possible values; now, though, they can do both simultaneously. Designing quantum logic gates is made more difficult by the fact that quantum operators must be unitary, and hence must be reversible. Ordinary logic gates, such as AND, throw away information – knowing that the AND of two inputs is 1 does not tell you what the inputs were – so they cannot be implemented in a reversible manner. Thankfully, it turns out that universal computation is 74 Chapter 4. Quantum computers A A∧B ¬A ∧ B A ∧ ¬N B A∧B Figure 4.1: Fredkin’s “billiard ball” model of computation. The presence or absence of a ball represents a value of 1 or 0, respectively, so this implements a two-input, four-output reversible logic gate. still possible; we just need to use different gates. An entertaining demonstration of this was discovered by Fredkin [61], which takes advantage of the fact that a classical collision between two hard spheres diverts each one from the path it would otherwise have taken; this is known as the “billiard ball” model, and is illustrated in Figure 4.1. Among the outputs of the “gate” is the AND if its inputs; the point is that it also has other outputs which, between them, allow the inputs to be uniquely determined. It is easy to see, for example, how the addition of a pair of “mirrors” would allow the original trajectories to be recovered. Fredkin was able to show that, with enough balls and mirrors, any logical operation could be accomplished; the only problem is that the system would generate a lot of useless output as well. Fredkin also designed the first reversible quantum gate, which now bears his name. This has three inputs and three outputs. The first input is a “control” and is output unchanged; the other two inputs are swapped if the control is 1 and output unchanged if it is zero; the truth table is shown in Table 4.1. The crucial aspect of this gate is that the set of outputs is a permutation of the set of inputs; in fact, a second application of the gate would undo the effect of the first one. The idea of having one or more “control” bits is common to many quantum gates. For example, Toffoli devised a gate with two controls, and a third bit which is flipped only if both of the controls are 1; the truth table is shown in Table 4.2. 75 Chapter 4. Quantum computers Inputs Outputs i1 i2 i3 o1 o2 o3 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 1 0 1 1 1 0 0 1 0 0 1 0 1 1 1 0 1 1 0 1 0 1 1 1 1 1 1 1 Table 4.1: Truth table for the Fredkin gate. Input i1 acts as the control. Inputs Outputs i1 i2 i3 o1 o2 o3 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 1 0 1 1 1 0 0 1 0 0 1 0 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0 Table 4.2: Truth table for the Toffoli gate, also known as the “controlled-controlledNOT” (CCNOT) gate. Inputs i1 amd i2 act as the controls. Chapter 4. Quantum computers 76 The most commonly considered gates apart from these two are the “controlledNOT” (CNOT)1 and a single qubit phase rotation. Deutsch [25] showed that it was possible to choose a Hamiltonian for a system (i.e. set the potential field appropriately) so as to make its evolution perform any unitary transformation2 . In particular, he showed that any unitary transformation could be effected by a finite set of “quantum gates”. A suitable “universal set” is given by the CNOT gate and a single qubit phase rotation, for example, making this result analogous to the situation for ordinary computers. Given this, we are free to consider algorithms specified in terms of arbitrary unitary transformations, and to take it for granted that they can be implemented efficiently. 4.1.10 Approaches to building quantum computers Although Feynman and Benioff both came up with the concept of a quantum computer, their designs are very different. If we imagine a basis of states that corresponds naturally to the possible states of the problem to be handled, the simplest option is to apply a unitary transformation which transforms each state into its successor in the computation. Applying this (time independent) Hamiltonian for a suitable length of time will then drive the initial state of the computer to the solution state. The problem is that, to remain unitary, this transformation must be a permutation, i.e. every state must have a successor, including the solution state. Thus, unlike an ordinary algorithm, where the solution state is a “halt state”, we have to define a successor for it. Benioff’s solution to this was essentially to give the computer a “clock”, applying a series of three operators in a cycle. This way, each operator was not required to be a permutation; like the Walsh-Hadamard transformation, it could push the state of the computer to an equilibrium. The overall transformation accomplished by a cycle was still a permutation, but applying it in this way allowed its speed 1 As the name implies, this has one control, giving it a total of two inputs and two outputs. Due to the unitarity of the time evolution operator, all quantum transformations must be unitary. 2 Chapter 4. Quantum computers 77 to be controlled, and the calculation to be stopped at the solution state. Feynman’s solution was rather different. He noted that, for an arbitrary matrix T, T + T† was automatically self-adjoint. Thus, he was allowed to define the successor of the solution state to be the state itself in constructing T. The problem is that this is analogous to the situation for spin waves in a one-dimensional crystal, which can propagate in either direction depending on their initial momentum. Here, the computation can proceed in either direction, obliging us to give the system enough momentum to keep it going in the right direction. The problem is that, for long calculations, thermal fluctuations or other effects can contrive to reverse the computer’s direction and make it undo what it has done. Thus, the computer’s evolution will have an element of Brownian motion, and there can be no guarantee that it will complete the calculation. 4.1.11 Error correction Quis custodiet ipsos custodes? (Who watches the watchmen?) The greatest practical difficulty involved in actually building a quantum computer is error correction. So far, we have been treating quantum systems as being completely isolated from the outside world, but clearly this cannot really be true. The effect of the computer interacting with its environment is equivalent to the application of random operators; for example, for one qubit we could have E(a|ei 0i + b|ei 1i) = a(c00 |e00 0i + c01 |e01 1i) + b(c10 |e10 1i + c11 |e1 10i), where E is an operator applied by the environment and ei summarizes the state of the environment. As well as simply mixing up the state of the computer, these interactions can allow information about the state of the computer to “leak out” into the outside world, which counts as a measurement. This can cause the delicate superpositional state of the computer to collapse; a process called decoherence. Chapter 4. Quantum computers 78 It was initially thought that these effects would make quantum computers of any useful size impractical, but Calderbank and Shor showed that quantum versions of error-correcting codes can be used to good effect [63]. As we shall see in Chapter 5, classical error-correcting codes encode the same information in several different places, allowing a certain number of errors to be detected and even corrected. The trick is to encode the possible states of the computer in a larger number of qubits, and to construct a set of operators which include at least the states corresponding to correct states of the computer in their set of basis states. This means that, if there has been no error, measuring any of these operators will not disturb the computer. If there has been an error, measuring one of these operators will collapse the state of the computer onto one of the basis states of the operator, converting a continuous error into a discrete error. Classical error-correcting codes have a parity-check matrix, which is used to identify errors: codewords give the null vector when multiplied by the parity-check matrix, and any other result can be used to diagnose the error. In the quantum case, this is replaced by combinations of operators from our set; the resulting set of measurements can be used to diagnose the (now discrete) errors. A potential problem with these error-correcting schemes is that they must inevitably be implemented on quantum hardware, which is itself subject to error. Thankfully, however, Shor and Kitaev independently showed [62] that it was possible for such systems to correct more errors than they introduced. Estimates of the requirements for such fault-tolerant quantum computing [64] are, however, severe: they can only tolerate an error probability of 10−5 per qubit per gate. This compares very favourably with the estimate of 10−13 for quantum computation without error correction, but it still represents a formidable challenge, particularly when we remember that the additional hardware required could well make the computer ten times larger. Another difficulty with error-correcting codes is that almost all the codes that have so far been constructed do not have a practical decoding algorithm (i.e. one that is polynomial in the length of the code block), and so would lead to exponential growth in the length of the computation. Recently, however, MacKay [65] has Chapter 4. Quantum computers 79 constructed the only known example of a practical error correcting code, based on a classical sparse graph code. The fact that error-correction is possible at all may seem very strange, in light of the discussion in Chapter 2, where we argued that any computationally useful physical system could be simulated in polynomial time on a conventional computer, due to the impossibility of packing more than a finite amount of information into a finite space without error. By contrast, we seem to be able to pack a superposition of 2N terms into a space that increases only linearly with N , leading to an exponential increase in the information required for simulation 3 . The trick here is that we can think of the terms in a superposition as effectively or actually being in separate universes, each of which is individually subject to the information bound. By packing all of these terms on top of each other, we have effectively circumvented the bound. On the other hand, this makes the effect of noise more significant, which is why we have to be aggressive in correcting errors before we see a benefit. 4.2 Quantum algorithms David Deutsch [25] is credited with the discovery of the first quantum algorithm which is demonstrably more efficient than any algorithm which could be executed on a classical computer. Since the algorithm is so simple, it provides us with a useful “toy model” with which to unpick the origins of this power. Deutsch considered the simplest possible boolean function f , which maps a single boolean value to another boolean value. There are only four such functions: two constant functions (f (0) = f (1) = 0 and f (0) = f (1) = 1) and two “balanced” functions (f (0) = 1, f (1) = 0 and f (0) = 0, f (1) = 1). Classically, there is no way of discovering whether a particular (unknown) function is constant or balanced without evaluating both f (0 and f (1); Deutsch discovered that, quantum 3 This is the crux of Feynman’s [23] argument that conventional computers were incapable of simulating a quantum system without an exponential slowdown. 80 Chapter 4. Quantum computers mechanically, the determination could be done in one step. √ The algorithm uses two types of quantum gate: a not gate, and one which √ evaluates the function. The not gate can be represented by the (unitary) matrix i 1 1 ; N=√ 2 1 i the name comes from the fact that N2 = i 0 1 1 0 , meaning that N 2 acts as an ordinary NOT gate (up to an overall phase factor). The function evaluation gate acts as F = (−1)f (0) 0 0 (−1)f (1) , i.e. it selectively multiplies the terms in the superposition by a phase factor. Using these gates, the algorithm simply computes NFN, which is i((−1)f (0) + (−1)f (1) ) 1 (−1)f (1) − (−1)f (0) . NF N = 2 i((−1)f (1) + (−1)f (0) ) (−1)f (0) − (−1)f (1) From this, we can see that the result (ignoring phase factors) is the identity if the function is balanced, and a NOT gate if the function is constant. Thus, if the computer starts in the pure state |0i, say, it will finish up in the state |0i if the function is balanced and |1i if it is constant. Deutsch has pointed out [66] that this algorithm has a very simple physical implementation, as shown in Figure 4.2; this gives us an interesting insight into what it is doing. The half-silvered mirrors serve to divide the beam in exactly √ the same way as the not gate, acting on the inputs, so we can think of the two paths as a qubit: Path 0 is |0i and Path 1 is |1i. The fact that the input beam follows both paths simultaneously shows that we have a superposition. If the devices placed along the paths are simply mirrors, the effect on the beam √ is equivalent to two not gates, so a beam presented to Input 0 would reappear Chapter 4. Quantum computers 81 Figure 4.2: Physical implementation of Deutsch’s algorithm. A half-silvered mirror divides an input laser beam in two, sending one part along Path 0 and the other along Path 1. The two beams interfere again at a second half-silvered mirror, and a pair of photo-sensitive detectors record the results. Taken from [66]. Chapter 4. Quantum computers 82 at Output 1. However, if we replace these mirrors by phase shifters, we can also implement the function evaluation gate. Clearly, what this algorithm does is to put the computer into a superposition of all possible states, simultaneously evaluate the function on all of them, and then to recombine the results using interference. In a way, it is cheating: collapsing a series of function evaluations into one simultaneous evaluation. This is an invariable motif in quantum algorithms: their power comes from their ability to perform operations in parallel without requiring extra resources. (In consequence, any calculation which cannot be parallelized will run no faster on a quantum computer than a classical computer.) The trick with algorithms such as this really lies in the recombination phase: we must arrange for the phases to fall in such a way that any measurement corresponds to a definite answer to our question. In Deutsch’s algorithm, the two answers are equally probable, so it was possible to arrange for some terms to cancel out completely; when this is not the case, the “signal” becomes weaker. For example, Deutsch and Josza [2] have considered an n-qubit version of this algorithm, which asks about an n-bit boolean function f . This time, the question can only be approximate: f is said to be balanced if the fraction of inputs for which f is 1 is approximately 1/2, and constant if one output value predomi√ nates. Rather than using the not gate, this uses a series of Walsh-Hadamard transformations, but the principle is broadly the same. This algorithm is not so easy to implement physically (we’d need to split the input into 2n beams) but it again trades on the fact that the fraction of phase shifts is either about 1, 1/2 or 0. If we were to try it with a function that was neither constant nor balanced, and to ask which was the better approximation, we would find that the algorithm started to make errors. We would find that there was some probability of making either measurement on any given trial of the experiment, and that what we were looking for was encoded in the relative probabilities. In this case, we would either have to keep repeating the experiment until we had enough data to give a reliable estimate, or make more function evaluations. 83 Chapter 4. Quantum computers 4.3 Grover’s database search algorithm Imagine, for a moment, an extreme version of Deutsch’s problem, where we know that a function is approximately balanced but want to know whether it is exactly balanced. In other words, is f (x) = 0 for all x, or is there some x for which f (x) = 1? This is equivalent to asking whether a particular entry is in an unsorted database, and, classically, may require us to search through the entire database. Grover’s algorithm, remarkably, is able to search an n-entry database in a time √ proportional, not to n, but to n. The algorithm is, fundamentally, very similar to Deutsch’s. If we were to try the n-qubit version of the algorithm on this problem, we would find that a measurement would almost certainly come down in favour of the function being constant, but that there was a very small probability of it telling us that the function was balanced. Say, for example, that the effect of the device is described by the operator √ a 1 − a2 √ 1 − a2 −a . In the limit as a → 0, this is equal to the “constant” result of the basic Deutsch device, and the other limit is equal to the “balanced” result, so this is at least a plausible approximation. In this case, a is either very small or zero, so the probability of obtaining a result of “balanced” is very small in either case. Grover’s trick amounts to combining this with an “inversion about the average” gate (which adds a phase factor of −1 to all states except |0i) and then sending the output back round to be the input to a second application of the device. The combined effect of two tours round the device and two “inversions about the average” is 1 0 0 −1 √ a 1 − a2 √ 1 − a2 −a √ 2a2 − 1 2a 1 − a2 = . √ −2a 1 − a2 2a2 − 1 2 The probability of finding a result of “constant” is now 4a2 (1 − a2 ), which is √ larger than a2 so long as 0 < a < 3/2. Thus, repeated applications of this process will boost the probability of measuring “constant” until it reaches three 84 Chapter 4. Quantum computers quarters unless a = 0, in which case the probability of measuring “constant” will always be zero. In addition, the process has its greatest effect when a is very small, where it essentially doubles the probability with each iteration. While this effect of successively doubling the probability does not continue indefinitely, it does explain why the algorithm is able to achieve a performance gain. If we denote by cn and bn the amplitudes for measuring the function to be constant and balanced respectively after n iterations of the device, we have that c1 = b, b1 = a and √ 1 − a2 bn−1 + acn−1 √ = abn−1 − 1 − a2 cn−1 , cn = bn meaning that the probability of measuring the function to be constant is √ c2n = (1 − a2 )b2n−1 + a2 c2n−1 + 2a 1 − a2 bn−1 cn−1 . Here we can clearly see the interference process at work; if the device had been behaving as a classical probabilistic process, we would not have had the last term on the RHS. This is what is speeding up the algorithm: the interference is consistently boosting one probability and diminishing the other. Grover’s algorithm [5] is similar to this in spirit, but slightly different in practice, in that it is designed to identify the database item as well as simply proving its presence. To see how the algorithm works, it is easier to consider a related algorithm, by Farhi and Gutmann [67]. (An entertaining analysis of Grover’s original algorithm can be found in [68].) Here, we consider a quantum system with N basis states and a Hamiltonian proportional to one of these, say w; The task is to discover w. From before, we have the Schrödinger equation, i ∂ |ψi = H|ψi, ∂t where, for this problem, H = |wihw|. To solve the problem, Farhi and Gutmann add an additional term to the Hamiltonian, namely 1 (|1i + · · · + |N i)(h1| + · · · + hN |). N To see why this helps, consider 85 Chapter 4. Quantum computers the subspace spanned by |wi and |si = √1 N P k6=w |ki; in terms of this, the addi- tional term is just ( √1N |wi + |si)( √1N hw| + hs|). This means that we can write the new Hamiltonian, H0 , as 1 1 |wihw| + |sihs| + √ (|wihs| + |sihw|). H = 1+ N N 0 µ ¶ The point about this Hamiltonian is that (to order 1/N ) the state |si is as likely to evolve into |wi as |wi is to evolve into kets; slightly more so, in fact. If we start √ √ off from the state 1 N (|1i + · · · + |N i) = 1/ N |wi + |si, we therefore find that the amplitude for state |wi steadily increases. Once enough time has elapsed, the probability that a measurement of the system will yield |wi is of order one, i.e. √ almost certain. Initially, the 1/ N terms ensure that the probability of observing √ |wi grows in proportion to 1/ N . This is closely analogous to the version based on Deutsch’s algorithm, where the probabilities of a state flipping from constant to balanced or vice versa is about equal, but one probability is initially much higher than the other4 . The difference here is that a measurement yields the anomalous state itself, not just an indication that one exists. Farhi and Gutmann’s analysis also yields a closed-form result for the probability of measuring the system to be in the state |wi at time t, P (t): P (t) = sin2 (Ext) + x2 cos2 (Ext), √ where x = hs||wi = 1/ N 5 . This shows that P (t) is exactly one at a time √ of (π/2) N , but it also shows that, if we leave measuring the system for too long then the probability will actually go down. This is a general feature of this style of algorithm, and is essentially a resonance effect: Grover showed that a version of the algorithm could also be implemented using coupled pendulums [75]. Unfortunately, this feature becomes particularly awkward if we try to apply it to problems with more than one solution. If we try to apply it to a problem with t solutions [7], the time required is of order q N/t to find a superposition of the solutions. This is acceptable if we know how many solutions there are, but the 4 Note, however, that in Deutsch’s algorithm the flip probability is the dominant one. We have taken the liberty of normalising the energy of the system so that Farhi and Gutmann’s E is one. 5 86 Chapter 4. Quantum computers more approximate our estimate of their number is, the greater the probability is that the algorithm will fail. For example, imagine that we want to solve a satisfiability problem by using successive applications of Grover’s algorithm to remove all the terms in the superposition that violate each of the clauses in turn. In other words, at step n we want to find a superposition of all the allocations which satisfy the first n clauses. It is, in principle, possible to do this [69] by introducing a family of gates F i which act as the identity on states satisfying the first i clauses and annihilate all other states. The gate combination Fi−1 GFi−1 + (I − Fi−1 ) – where G is an iteration of the Grover algorithm and I is the identity – achieves the appropriate result at the ith step in a unitary fashion. Applying this combination applies the Grover algorithm only to those states which have passed all the previous tests, and leaves the amplitudes of all other states as they are. If the Grover algorithm could perfectly weed out the satsfying allocations for the current clause from those it is given, then the end result of this process would be a superposition of exactly the states that satisfy the given problem. What’s more, each step potentially only has to weed out one eighth of the clauses (for a 3-SAT problem), meaning that t ≈ 7N/8, and that the time required is q 8/7 ≈ 1. In short, this algorithm appears to be able to solve the problem in a time linear in the number of clauses. The problem, of course, is that this requires us to have a very precise knowledge of the fraction of the remaining states which will satisfy the current clause; otherwise, the probability of failure will grow as the algorithm proceeds. Even if the success probability at each step is 99%, the probability of making it all the way through 2000 clauses is about 10−7 %, i.e. negligible. On a more positive note, if the fraction of solutions is exactly a quarter, Grover’s algorithm is guaranteed to find the answer in one step [70]. In the general case, where we have no special knowledge to exploit, it has been shown that Grover’s algorithm is optimal [8] in terms of the number of database queries6 . (The proof depends on the fact that any search algorithm must be 6 Improvements can, however, be made in terms of the total number of non-query operations; Chapter 4. Quantum computers 87 capable of redistributing the probabilities widely enough that the answer can be found, irrespective of what it turns out to be. With only unitary transformations available, there is a limit to how fast this redistribution can happen.) Thus, as we might have expected, the success of algorithms like these is intimately related to our knowledge of the search space. 4.4 Shor’s factoring algorithm Grover’s algorithm is impressive, but ultimately it can only provide a polynomial speedup. The first – and, so far, only – algorithm to demonstrate that a truly exponential speedup was Shor’s algorithm for factoring integers [4]. The best classical algorithm known for factoring large numbers is the “number field sieve” (see e.g. [71] for details), which has a worst-case running time of O(exp[c(log n)1/3 (log log n)2/3 ]) for factoring n, where c ≈ 1.9. This makes it significantly better than e.g. brute- force trial-division, in that it is not truly exponential in the number of bits in the number. However, it is not polynomial either; it is sub-exponential . Shor’s algorithm, by contrast, is genuinely polynomial. The algorithm is based on Fermat’s Little Theorem, which states that, for any n, np−1 modp = 1 for any prime p which does not divide into n. This can be extended to state that xr modn = 1 for any numbers x and n and some r. If r is even, then (xr/2 − 1)(xr/2 + 1) = xr − 1 = 0modn. Thus, xr/2 + 1 and xr/2 − 1 both divide n. If r is as small as possible, this means that either xr/2 − 1 must divide one of the factors of n or xr/2 + 1 must do. In the first case, we have xr/2 = 1modp, where p is a prime factor of n. The original version of Fermat’s Theorem now applies, so we have that r/2 = p − 1, i.e. we have found one of the prime factors. Shor showed that the probability of this approach paying off for a number with k prime factors is at least 1 − 1/2k−1 , so there is always at least a 50% chance of success. see [74]. 88 Chapter 4. Quantum computers The hard part is computing r, and this is where the uniquely quantum nature of the algorithm emerges. Classically, it is possible to compute xa modn for some i given a quite efficiently: repeatedly square xmodn to get x2 modn for all i such that 2i < r, then multiply these together in appropriate combinations. This takes broadly of order O((log2 n)2 ); the problem is that we need to do this for all a < r. With a quantum computer, however, we can do better: by starting off from a superposition of states P a |a; 0i, we can apply the classical algorithm to all the states simultaneously, to form P a |a; xa modni7 . The difficulty now, of course, is picking out the correct answer, i.e. the state |r; 1i. Shor’s algorithm does this by means of a quantum Fourier transform. All functions can be rewritten (or decomposed ) in terms of a suitable set of basis functions. For example, the set {xn |n ≥ 0, n ∈ Z} – i.e. the polynomials – form one such set, implying that any function can be represented as a polynomial with suitable coefficients. The set of sine functions – i.e. {sin(2πsx)|s > 0, s ∈ R} – form another basis set, and the Fourier transform F (s) of a function f (x) gives their coefficients. The transform is given by F (s) = Z ∞ −∞ f (x) exp(−i2πxs)dx. Clearly, except in the cases where this integral can be calculated analytically, determining the transform exactly is not feasible, and we have to fall back on the discrete Fourier transform (DFT), which is F (2πr/Tr ) = NX 0 −1 f (xk ) exp(−i2πrk/N0 ), k=0 where Tr is the period if the rth component of the transform and N0 is the number of samples of the function f in one period. As written, this transform would require somewhere in the region of N02 computational steps; in fact, the best known algorithm for computing the DFT – the fast Fourier transform (FFT) – is linear in N0 . The important point, however, is that the running time of any classical algorithm is polynomial in N0 . 7 The notation |a; bi is intended to summarise a state where the first half of the qubits represent a and the second half represent b. 89 Chapter 4. Quantum computers By contrast, the quantum Fourier transform which lies at the heart of Shor’s algorithm takes advantage of quantum parallelism to give a running time which is polynomial in the number of bits of N0 – an exponential speedup. This transform operates on quantum states |ai rather than integers, and transforms each such state into X 1 q−1 q 1/2 c=0 |ci exp(2πiac/q), where q and a are integers and 0 ≤ a ≤ q; q is playing the rôle of N0 above. At first sight, this would appear to require essentially the same number of steps as the classical version: we still have to perform a sum over q terms. The crucial difference is that we are now able to apply operations simultaneously to a large number of states; Shor gives an example algorithm which uses a sequence of Walsh-Hadamard transformations and single-qubit rotations to perform an exact DFT in polynomial time. As Shor notes, however, this procedure is not very practical: it involves phase rotations of the order of π/2log q , which can be very small indeed when q is large. On the other hand, the small size of these rotations indicates that, with care, it may be possible to neglect them and still perform a useful approximate transform. Coppersmith [16] has provided just such an algorithm, and it leads to an error of at most 2πL2−m in each phase angle, where L is the number of bits of q and m is a measure of the desired accuracy. This yields a transform which requires no phase rotations smaller than π/2m−1 . As Coppersmith notes, values of L = 500 and m = 20 yield a maximum error of 3/1000. In addition, neglecting small rotations means that Coppersmith’s algorithm requires fewer steps, and it also turns out to be efficiently parallelizable. Coming back to the factoring algorithm, the effect of this transform is to perform a “frequency analysis”, in much the same way as the classical version would when applied to a sound wave. Since the data is periodic with period r, this gives a series of sharp peaks, spaced q/r apart. This means that any measurement will almost certainly give the value of one of the peaks, say qd/r. We know q, and can estimate d and r by finding the values that most accurately match d/r. Even in the worst case, it only takes a few attempts to find r. Chapter 4. Quantum computers 90 None of the computational elements of Shor’s algorithm are directly responsible for the speedup it achieves; although they apply quantum parallelism, we still have the problem that the results are encoded in a probability distribution, of which we can only measure one point. The genius of the algorithm is that it creates a distribution which is very sharply peaked, with a definite period. As a result, even one point is able to give us useful information. Without this great regularity, we would not have gained much. This illustrates one of the central features of powerful quantum mechanical algorithms: it is easy to apparently do a lot of work and encode a large amount of information in a quantum state, but hard to read out anything useful without a lot of effort. It is probably fair to say that the only time we can hope to obtain a quantum speedup is for problems where classical algorithms have to compute a large, complex distribution, but then only ask about some of its grossest features. This is where the gain comes from: quantum mechanics seems to allow us to put off the moment when, to reduce the workload, we have to make a wild approximation; classically, it would have to be done by only computing a very much more approximate transform, and this would wash out even the gross features. A useful picture to keep in mind when thinking about this question is that of a multiple-slit experiment. Classically, we can use water waves; quantum mechanically, we can use laser light or electrons. The classical version has to be much larger, and uses a large volume of water, but it automatically shows you the whole interference pattern. The quantum mechanical version also requires an intense laser beam in order to show the whole pattern, but it can also say something about the pattern with a single photon. Somehow, by according reality to the wavefunction even in this low-intensity limit, we have been able to make it do work for us. As always, though, less work/energy must lead to less information out: the trick is to make it the right information. Chapter 4. Quantum computers 4.5 91 What makes quantum computers powerful? As mentioned in the introduction, quantum mechanics is essentially just an ordinary wave theory (admittedly over a complex-valued field); the only thing that is peculiar about it is that we are using it to describe particles, not waves. Given this, it is instructive to enquire as to how far we can go in simulating quantum systems using purely classical waves. The answer, it turns out, is quite far: it is possible to build simulations which have all the properties required to implement both Grover’s and Shor’s algorithms, albeit with some loss of efficiency. In the following sections, we will use these to highlight the reasons for the apparently greater power of their quantum counterparts. 4.5.1 Classical versions of Grover’s algorithm From the description of Grover’s algorithm given above, it is clear that its effect is due to a combination of entanglement and interference. Interference is also a property of classical waves, but entanglement is not; interestingly, however, it is possible to design a version of the algorithm without entanglement. We can see this by considering the physical version of Deutsch’s algorithm shown in Figure 4.2. This works either as a fully quantum system, with only a single photon as input, or as a classical wave system. (Clearly, it will work just as well with an intense laser beam as with a single photon, in which case we would not expect to have to consider quantum effects.) There are two important differences between the quantum and classical versions of Deutsch’s algorithm: we have to split the input wave into one beam for every possible function input, and the input wave has to be intense enough for all the beams to have a reasonable amplitude. This makes the classical machine exponentially larger, and require exponentially more power, but it does not take significantly longer to execute the algorithm. (The necessary beam splitting has to be accomplished by using a set of N splitters, but each beam will emerge having only enountered log N of them.) Chapter 4. Quantum computers 92 In the same spirit, a neat demonstration of Grover’s algorithm is given by Lloyd [18], where he considers an array of coin slots, one of which flips the coin over on its way through. Lloyd actually considers a quantum version of the algorithm which uses different translational modes of a single neutron, thus precluding entanglement, but we can equally well use classical waves. If we imagine that the slots invert the polarisation of a laser beam, we find that we can extend Deutsch’s algorithm, much as before: split the beam so that it goes through all the slots simultaneously, recombine it, and then “invert about the average”. If we send in an intense initial pulse, then, over time, the beam will focus on the slot which does the inversion. Again, the time required to execute the algorithm is no worse than in the quantum case, though the other requirements are exponentially larger. Interestingly, once we submit to the exponentially higher energy requirements of these classical algorithms, it is actually possible to find a faster algorithm. If we simply want to ascertain the presence or absence of a database entry, then one application of the n-qubit Deutsch algorithm is all that is required. The small probability of measuring “balanced” after one pass through the apparatus has now become a small amplitude, which we can measure straight away. A device based on this principle has even been built, by Walmsley [19]. This uses a device called an acousto-optic modulator, which can be induced to have different densities at different points, thereby storing a database of information. If a broad spectrum of light is shone on the modulator, then regions of different density will have a different effect on the waves passing through it, causing them to bend to a greater or lesser degree. By splitting the light beam in two, sending one beam through the modulator, and then causing them to interfere, these differences become measurable. 4.5.2 Classical entanglement Entanglement is an intrinsically quantum property, but it turns out that we can nonetheless simulate it quite effectively [72]. Provided that the classical system used to do the simulation has as many degrees of freedom as the quantum one, Chapter 4. Quantum computers 93 careful use of beam-splitters, mirrors and other devices can allow the system to be manipulated as if it were an entangled quantum system. If we consider a laser beam from the perspective of classical electrodynamics, it offers us two degrees of freedom, e.g. the horizontal and vertical polarization components. The state of the beam can be described by the two (complex) amplitudes of these components, which are independent. In consequence, we can think of the beam as a classical qubit; what Spreeuw [72] calls a cebit. Problems arise when we try to consider analogues of multiple-qubit entangled states. For example, a three-qubit state has 23 = 8 degrees of freedom: |Ψi = c000 |000i+c001 |001i+c010 |010i+c011 |011i+c100 |100i+c101 |101i+c110 |110i+c111 |111i. To achieve the same degree of freedom in a classical system, we need four laser beams. One way to achieve this is to number the laser beams in binary and associate the amplitude cij0 with the horizontal polarization amplitude of beam number ij (and associate cij1 with the vertical polarization amplitude of the same beam). This allows any entangled state to be represented, but operating on the state is more tricky. However, we can, for example, achieve a rotation of the third qubit by sending all four beams simultaneously through a block of material which rotates their polarization. Similarly, careful splitting and recombination can be used to simultaneously apply the operations required to simulate rotations on other qubits. Spreeuw’s conclusion is that all the basic operations required for quantum computing – single-qubit rotations and two-qubit gates – could be carried out efficiently in this way. However, because the qubits are not localized (their description being spread across the laser beams), it is not possible to separate them. Real qubits in an entangled state can be separated, which leads to the rather strange conclusion that measuring the state of one qubit can affect the state of others, irrespective of how widely they are separated; these “non-local” effects cannot be emulated. (The most famous example of this is the EPR experiment, which is discussed further in Appendix A.) Chapter 4. Quantum computers 4.5.3 94 Conclusions We have just seen that everything required to perform quantum computations, even powerful ones such as Shor’s algorithm, can be simulated classically without any slowdown. The price we pay for this is that the energy and space requirements of the algorithm increase exponentially. On the other hand, quantum probability amplitudes are simulated using real, measurable amplitudes. Thus, although a classical simulation of a quantum Fourier transform is costly, we can read off the whole transform rather than just a single point. In the end, the power of quantum mechanics comes down to the fact that we have attributed reality to probability amplitudes, in the form of the wavefunction. This has two consequences: first, it allows us to save an exponential amount of energy by making the mere possibility of a particle do work; and secondly it allows us to pack exponentially more degrees of freedom into the same space. The problem with this, as Shor’s algorithm shows, is that fully realising these savings requires us to be able to work with only a small part of the information that a classical algorithm would automatically have given us. If we relax the information requirement by e.g. turning up the intensity of our laser beam, the example of Deutsch’s algorithm shows that we can achieve a more √ powerful algorithm: a speedup from N steps to N steps becomes a speedup to a single step. As Černý realized [73], this makes solving NP-complete problems simple: form a superposition, try all possibilities simultaneously, and pick out the correct answer. On the other hand, living with the information requirement is very difficult – as demonstrated by the fact that Shor’s algorithm is still the only one known to offer an exponential speedup – and seems to require a deep insight into the problem. In fact, it has been shown [1] that no algorithm that treats the problem like a “black box”, to be queried by the algorithm but not understood, can achieve more than a polynomial speedup. Chapter 4. Quantum computers 4.6 95 Summary We have seen that quantum computers do indeed appear to be more powerful than their classical counterparts, but that their abilities are more limited than they might at first appear. The speedup obtained by Grover’s algorithm is essentially a consequence of the wave nature of quantum mechanics, compressed in space and energy by the ability to take advantage of probability amplitudes. Calculating using probability amplitudes makes obtaining the right information very difficult, and requires insight into the problem. In the end, quantum computers expand the available “toolbox” for solving problems, but only become powerful when we have enough insight to use the new tools it offers, as in the case of Shor’s algorithm. Chapter 5 Test case As a means of evaluating the performance of different physical systems in practice, we now consider the example problem of decoding error correcting codes. This problem has the advantage that it is well-studied, both from the point of view of complexity theory, and in terms of the practical issues involved in decoding. In particular, the examples we will discuss have a natural formulation both as an Ising model problem and as a neural network problem. First, however, we must introduce error correcting codes, and their theoretical properties. As before, we can only give the briefest overview here; for further details, see [76]. 5.1 Error correcting codes To protect messages against transmission errors, error correcting codes are often used. These codes introduce redundancy into the message (i.e. encode each bit more than once); this way, small amounts of error can be detected and corrected, allowing perfect transmission even through an imperfect medium. The most common type of error correcting code is the linear code, where the message is encoded by multiplying it by the generator matrix of the code. We 96 97 Chapter 5. Test case shall write this as r = Gs + η, (5.1) where the vectors s, r and η represent the message, encoded message and transmission noise, and the matrix G represents the generator matrix of the code. To introduce redundancy, G has M rows and N columns, with M > N . The degree of redundancy introduced by a code is described by its rate, which is the ratio N/M . Codes with higher rates allow the transmission of more data over a given channel, but are typically less able to correct errors. A desirable practical property of a code is that it be sparse, i.e. have relatively few non-zero entries. This reduces the encoding time, and allows the matrix to be compressed for distribution to the code’s users. From now on, we will focus specifically on sparse binary codes, as these are the most commonly used in practice. In fact, we will be even more specific than that: we will focus on generator matrices with exactly two non-zero entries per row and four per column, such as the following example. 1 1 0 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 1 1 This code allows transmission errors to be both detected and corrected, because each bit of the message is encoded in four places. For example, say that the third bit of the received message was corrupted. If we assumed that this bit was Chapter 5. Test case 98 correct, we would have to make a wrong assumption about the value of either the first or the fourth bits of the message. This wrong assumption, however, would result in wrong values for three bits of the received message. Thus, provided that the noise level is not too high, we can decode the message simply by accepting the value which leads to the lowest level of implied noise in the transmission. It might appear from this that the level of correctable noise is very low: for example, errors in the first two bits of the transmission would leave the value of the first bit of the message undecided (as either choice would imply two transmission errors). However, such noise combinations are rare; if we are willing to accept the occasional wrong decoding, then we can tolerate much more noise. To see exactly how much noise these types of code can tolerate, let us consider the example of a code with exactly two non-zero entries per row and four per column (like the example above). This has a very convenient representation in terms of an Ising spin glass model (something we shall discuss in more detail later). If we map the elements of the message vector s onto the spins of the model, then we can map the rows of the generator matrix onto bonds between the spins with non-zero entries in that row. For example, the top row of the above matrix can be taken to represent a bond between the first two spins. If the corresponding element of the encoded message vector r is 1, we define the coupling to be −J (as the “spins” must have been opposite in value to produce this answer); if it is 0, we define the coupling to be J. The Ising model corresponding to the above matrix is shown in Figure 5.1. With this representation, it is easy to see that the model with no noise is completely unfrustrated. Noise flips the signs of some of the bonds, leaving a model with a frustrated ground state. However, the directions of the spins will be unaffected provided no spin has more than one noisy bond. The probability of noise leaving the decoded message unaffected is thus given by the probability that no two noisy bonds are attached to the same spin. We can approximately calculate this as follows. The number of ways that n noisy bonds can affect 2n different spins (out of a total of N ) is given by the number 99 Chapter 5. Test case 1 2 3 4 5 6 Figure 5.1: “Ising model” corresponding to the example generator matrix. This model has non-local bonds, but it is still planar, so it is still easy to find the ground state exactly. of distinct ways of picking n pairs of different numbers from a total of N , i.e. N! . (N − 2n)!n!(2!)n The probability that all the number pairs will actually correspond to bonds in the model is (4/(N − 1))n : of the N − 1 spins that the first spin of a pair could have been paired with, 4 of them correspond to bonds in the model. Thus the number of ways of adding n noisy bonds without affecting the result is approximately N! (N − 2n)!n!(2!)n µ 4 N −1 ¶n . The total number of ways of selecting n noisy bonds from the 2N in the model is just (2N )!/((2N − n)!n!), so we get a total probability that the noise will not affect the result of N !(2N − n)! 2 P (N ) = (N − 2n)!(2N )! N − 1 µ ¶n . Unfortunately, this probability drops rapidly as N increases, as shown in Figure 5.2. This should not come as a surprise: for a fixed noise probability p, the probability that the four entries in the encoded message vector associated with a given message bit will be affected by no more than one bit of noise is (1 − p)4 + 4p(1 − p)3 . The probability that none of the sets of entries associated with any of the message bits will be affected is then ((1 − p)4 + 4p(1 − p)3 )N , which broadly matches the results shown in Figure 5.2. 100 Chapter 5. Test case 1 0.8 0.6 P(N) 0.4 0.2 0 10 20 30 40 50 60 70 N Figure 5.2: Probability of successful decoding, P (N ) for a noise level of 5%. The success probability drops, apparently exponentially, as N increases. 101 Chapter 5. Test case To get round this, we introduce another sparse matrix, B, of size M × M and use G0 = B−1 G as the generator matrix. The advantage of doing this is becomes more apparent when we multiply Equation 5.1 through by B, to obtain Br = Gs + Bη. In the same way that G is used to encode each entry of s in several places, the introduction of the matrix B has the effect of encoding each noise bit in several places. This makes it possible to cross-check noise estimates as well as message estimates. Belief Propagation methods are currently the best known decoding methods, because they can take advantage of this fact. 5.2 Complexity of the decoding problem The decoding problem in the absence of noise essentially involves solving a set of linear simultaneous equations. If we recall that SAT problems can be restated as systems of linear equations, then we can restate it as what initially seems like a somewhat pecuilar version of XOR-SAT, with no variables negated but the requirement that some of the clauses not be satisfied. For example, the toy problem 1 1 0 0 1 s= 0 1 1 1 0 1 1 is equivalent to asking for boolean variables A, B and C that satisfy A XOR B and A XOR C but dissatisfy B XOR C. However, dissatisfying B XOR C is equivalent to satisfying either ¬B XOR C or B XOR ¬C. From this, we see that the decoding problem is actually equivalent to XOR-SAT. Since XOR-SAT is in P, it follows that the decoding problem in the absence of noise is also in P, as we would expect (given that Gaussian elimination is in P). As soon as we add noise, however, we have to solve MAX XOR-SAT, which is NP-complete. Feasible decoding is therefore restricted to problems which lie below the phase transition. 102 Chapter 5. Test case To get an idea of where the phase transition lies, we can ask about the region for which the naı̈ve measure of success – the number of satisfied clauses – bears some relation to the real measure – the number of wrongly-set message bits. From the discussion in Chapter 1, we can guess that this is the region in which simple methods such as local search should work. To estimate the relative closeness of these two measures, we shall look at the probability that flipping a message bit from wrong to right will increase the number of clauses satisfied relative to the probability that a flip from right to wrong will achieve this. Consider a problem whose generator matrix has n non-zero entries per row and m per column, and where the noise probability is pe . Imagine, also, that the entries in our current estimate of the message vector are correct with a probability p 1c . If we look at the set of rows of the generator matrix which have an entry for our chosen message bit, the first thing we need to know is the probability that the contribution from the other message bits represented in each row is correct. The probability pn−1 that an arbitrary sequence of n − 1 results will turn out to give c the correct answer is given by 2pnc − 1 = (2p1c − 1)n , and hence the probability that a correct answer for the chosen message bit in a given row will give a correct answer for the associated encoded message bit is pnce = pnc (1 − pe ) + (1 − pnc )pe . Thus, the probability that a m rows will give c correct answers and m−c incorrect answers is Pm c=0 m m c (pn )c (1 − pn )m−c m c ce ce , (pn )c (1 − pn )m−c ce ce are binomial coefficients. If c > m/2, then flipping the message c bit to be correct will sattisfy more clauses than flipping it to be wrong; if c < m/2, where the 103 Chapter 5. Test case 10 9.5 9 8.5 8 7.5 7 6.5 6 5.5 5 F(pc,0.1) 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 0.5 0.6 0.7 0.8 0.9 1 pc Legend (4,8) (3,6) (2,4) (100,200) Figure 5.3: F (pc , pe ) for pe = 0.1, for various matrix densities (n, m). For (2, 4), the naı̈ve and error estimates are correlated, but as soon as the density rises above this they fail to be. the reverse will be true. The region of where the naı̈ve and real estimates match more often than not is thus given by Pm c=0 (1 F (pc , pe ) = Pm m m − p1c )Θ(c − m/2) c=0 p1c Θ(m/2 − c) c c (pn )c (1 − pn )m−c ce ce > 1, (pn )c (1 − pn )m−c ce ce where Θ(x) = 1 for x > 0 and 0 otherwise. Figure 5.3 shows that unless n = 2, m = 4 we need to be quite close to a solution before the two error estimates are correlated. Thus, we would expect that this is the only value for which simple local search algorithms should work. Concentrating on these values, Figure 5.4 shows that noise values of up to about 104 Chapter 5. Test case 5 4 3 F(pc,0.1) 2 1 0 0.5 0.6 0.7 0.8 0.9 1 pc Legend pe=0.1 pe=0.2 pe=0.3 pe=0.4 Figure 5.4: F (pc , pe ) for n = 2, m = 4 and values of pe between 0.1 and 0.4. This shows that noise levels of up to about 20% can be tolerated. 20% can just about be tolerated. These results also show why simple local search with a sparse G would not work: the correlation between the two accuracy estimates falls away drastically as we reach the answer. This means that we would expect such methods to approach the answer and then get stuck. However, since we have not involved the matrix B in the game, this is not surprising: the previous analysis shows that we cannot recover the answer just by finding the minimum error configuration. The addition of the matrix B is particularly useful in taking us from close to the answer to finding it exactly. When we have a relatively accurate estimate of the message, the corresponding estimate of the error vector is also fairly accurate. The additional cross-checks introduced by B then make it easy to identify the remaining inaccuracies in the error estimate, and push the solution process on 105 Chapter 5. Test case towards a solution. The preceding analysis has been very simple, but it shows many of the problem’s basic features, which are bourne out by more accurate analysis. 5.3 Cryptography As well as protecting messages against noise in transmission and storage, errorcorrecting codes can be used as the basis of a public-key cryptosystem; the two best-known schemes are due to McEliece [87] and Kabashima, Murayama and Saad (KMS) [88]. (It is notable that, unlike the RSA system, no way has yet been found to adapt these schemes into digital signature scheme [90].) Both schemes involve hiding the generator matrix of an error correcting code within another matrix, which is then given out as the public key. Users wishing to encode a message multiply it by the public key (effectively using it as a generator matrix), and then artificially add noise. The aim is that decoding the message should be impossible in practice without knowledge of the (hidden) error correcting code, and that reconstructing the underlying code from the public key should also be infeasible. The idea behind these systems is in many ways similar to that behind other public-key systems, such as RSA. In RSA, the quantities required for decoding – the factors of the integer used as the public key – are also hidden in the public key, and the security of the system depends on essentially the same requirements. In fact, it is hard to see how a public-key system could be designed in any other way. The original KMS scheme depended on a sparse error correcting code with generator matrix A, of dimensions M × N , and a sparse invertible matrix B of dimensions M × M . Using these, the public key matrix GG is defined as G = B−1 A. 106 Chapter 5. Test case The message sender uses G to construct an encoded message vector r from their message vector s as in Equation 5.1; the vector η is chosen to be random, but sparse. The legitimate recipient of such a message makes the product Br, in the knowledge that Br = As + Bη. Provided that the matrices A and B are sufficiently sparse, this can be solved by belief propagation. Any unauthorized recipient, by contrast, is faced with the decoding problem given by Equation 5.1, for which belief propagation does not converge. The problem with this scheme is that extracting the matrices B and A from G turns out to be relatively straightforward; all that is required is to solve BG = A, in the knowledge that A and B are extremely sparse, which can effectively be done by enumeration. For example, we could enumerate all possibilities for a row of B, keeping those that would result in a suitably sparse row for A. The fact that this procedure only recovers B and A up to a permutation (and may conceivably recover completely different matrices) does not matter: the resulting matrices still have all the properties required for successful decoding. To remedy this problem, an additional “scrambling” matrix R was added to the scheme, so that the public key became G = B−1 AR. The legitimate decoding process is very little different; by substituting ŝ = Rs, we obtain Br = Aŝ + Bη, which can be solved just as easily as before. (The vector s can then be recovered from s = R−1 ŝ.) The key feature of R in terms of the security of the scheme is that any invertible N × N matrix can be used; in other words, we are free to choose R to be dense. 107 Chapter 5. Test case The introduction of R frustrates the attack outlined above, as factoring G now requires the solution of BG = AR. The only known attack against the improved scheme that cannot be prevented by a carefully designed protocol is to randomly select N entries in r, together with the corresponding rows of G, and hope that the chosen entries were not affected by noise. If this turns out to be the case, then this reduces the problem to the form r0 = G0 s, which is solvable as a system of linear simultaneous equations. (The decoder can also check the accuracy of the decoding by attempting to reconstruct r: if they find a noise level within the tolerance specified for the scheme, the right answer has been obtained; otherwise, some of the chosen entries must have been affected by noise.) This attack can clearly be frustrated simply by choosing M and N to be sufficiently large. Schemes based on error correcting codes have an advantage over RSA, in that they allow faster encoding and decoding; however, distributing the public key (the matrix G) requires the transmission of much more data than for RSA. (Even the G of the original KMS scheme is quite dense: although B may be sparse, B−1 is not.) For this reason, it would be preferable to find a scheme which resulted in a relatively sparse G. Since the sparsity requirements of belief propagation decoding are quite extreme, such a G could still prevent direct decoding by belief propagation; the problem is find a choice that can be decomposed. In the next subsection, we discuss a scheme that has been proposed to get round this problem, and the weaknesses that the sparsity requirement introduces. 5.3.1 The Barber scheme This scheme uses matrices A and B of the same form as those for the KMS scheme, but hides them in two public-key matrices, D = CAR and E = CBQ, where C, R and Q are all sparse. The encrption process then requires the solution of Et = Ds 108 Chapter 5. Test case for the vector t, to which noise is then added to form the ciphertext r = t + η. To decode the message, we note that Er = Et + Eη (5.2) = Ds + Eη (5.3) BQr = ARs + BQη = Aŝ + Bη̂, (5.4) (5.5) which is the same problem that we had before, provided that η̂ is no more dense than η; this is ensured by choosing η to be the sum of a suitable number of (randomly chosen) columns of Q−1 . (The details of how to do this while not divulging information about Q need not concern us here.) The advantage of this scheme is that the matrices D and E are both relatively sparse; the disadvantage is that this renders them vulnerable to attack. For example, we can attack D by attempting to solve DR−1 = CA, knowing that, because C and A are both sparse, CA is likely to be sparse as well. By analogy with the above, we attempt to find all possible solutions to Dx = c, where (hopefully) x is a column of R−1 and c is a column of CA. Due to the sparsity of c, we can consider it to be a noisy version of the null vector and attempt a version of the “pick N rows and hope” attack outlined above; the “noise level” now implies that we will be quite likely to succeed. If we repeat this process, always picking rows to try to ensure that we don’t find the same solution too many times (by e.g. picking one of the rows that was previously found to be non-zero on our next attempt), there is a reasonable chance that we will be able to extract a large number of solutions for x, and hence make a guess at R−1 and CA. (As before, the fact that we can only hope to find it up to a permutation makes little difference to the usefulness of our solution.) Chapter 5. Test case 109 From our guess at CA, we can attempt to use the same sparse matrix factorisation techniques as before to extract C and A. A knowledge of C would then allow us to find BQ, and hence B and Q. There are two ways in which this scheme could fail: either we could fail to find enough solutions to build a guess at R−1 , or we could find too many (in which case the task of finding a combination which provided an acceptable factorization for CA could prove infeasible). This last possibility we consider unlikely, given that D is a (disguised) error correcting code; the redundancy it introduces makes it most unlikely that anything but the correct R−1 would produce a sparse result. 5.3.2 Discussion Public key cryptosystems inevitably rely on the fact that the public key has hidden structure which is not apparent to anyone but the legitimate recipient of the message. With schemes such as RSA, which rely on the difficulty of factoring large numbers, the public key has exactly two (large) factors; its security relies on the fact that this does not make the factoring problem significantly easier. In the case of the schemes discussed here, the security comes from the hope that knowing that the public generator matrix has a factorized structure does not make the scheme significantly easier to decode than one based on a completely random generator matrix. This is very reminiscent of the situation with hard NP problems: they too have a structure which seems to be impossible to exploit. The structure of the KMS scheme does allow a small amount of information to leak out; just not enough to compromise its security. For example, the structure of its matrix A ensures that multiplying it by a vector entirely composed of 1s yields the zero vector. Thus, if Gt = 0, then we know that Rt = (1 1 1 . . .)T . This is not much, but it does show that we have made the decoding problem at least slightly easier by imposing a structure on G. Chapter 5. Test case 110 Figure 5.5: By arranging the spins into a grid, simple nearest-neighbour bonds give most of the spins four bonds each. A few non-local bonds round the edges can then ensure that all the spins have four bonds without sacrificing the planarity of the model. 5.4 Solving the decoding problem using physical models We now consider two physical implementations of the decoding problem: as an Ising model and as a neural network. 5.4.1 Ising model formulation We have already seen how some decoding problems – those with n = 2 and m = 4 can be implemented in terms of “spin glass” Ising models. The preceding analysis has also indicated that these are precisely the problems that we can reasonably expect to decode. However, there are two problems with the idea of using an Ising model as a practical decoder. If we look at the model shown in Figure 5.1, we see that the model has non-local bonds, which are not physically viable, and that it is planar, making the problem of finding the ground state straightforward. In fact, it is possible to design a (2, 4) code of any size which can be mapped onto a planar (but not local) model, as Figure 5.5 shows. Potentially, wrapping the Ising model onto a sphere could solve the non-locality problem (at the cost of losing planarity), so a physical implementation might be possible for some codes. However, we are still faced with the fact that the Ising Chapter 5. Test case 111 model’s evolution implements only a fairly basic local search method, making it ineffective for large codes. An interesting aside that comes out of this discussion is the point that we can design a (2, 4) code so that minimum cut is guaranteed to find the minimum-error solution. However, since we are unable to extend this idea to coding schemes which include the additional matrix B, this is not very useful. 5.4.2 Neural network formulation It has been shown that neural networks are, in principle, capable of learning in a manner equivalent to an asynchronous version of the belief propagation algorithm [89]. This makes them much better suited to the task at hand, in that they are flexible enough to take advantage of the additional information on offer with the addition of the matrix B. This means that a carefully designed neural network can act as an efficient decoder. The reason for this improved performance over the Ising model approach is that it the neural network is much more highly organized , and is capable of highly nonlinear behaviour, due to the threshold nature of the neurons. 5.5 Summary In this chapter, we have seen that the decoding problem for error-correcting codes is very delicate: unless the generating matrix is very sparse, it is impossible, and even then it requires an additional matrix to provide cross-checks on the errors. Given this, it is perhaps no surprise than the Ising-like decoding scheme is hopeless, and must defer to the neural network-based one. For the cryptographic problem, we have also seen how a straightforward decoding problem can be effectively hidden in what appears to be a much harder one, at least to someone who does not know the secret. Unfortunately, we have also seen Chapter 5. Test case 112 that, although the codes are based on sparse matrices, it is difficult to formulate a public key which is sparse. If seems that we need at least one dense “scrambling” component to keep the system secure. Chapter 6 Conclusions Judge a man by his questions rather than by his answers. Voltaire Over the course of this thesis, we have seen how hard NP-complete problems can be: they can have complicated, intimately connected and exponentially large search spaces which have no reliable compass that we can realistically find. We have also seen, however, that the majority of even NP-complete problems are actually quite easy, or at least easy to approximate; it is only in quite a narrow region of their parameter space that they really show their teeth. All the physical systems we have considered, from the purely mechanical to neural networks (artificial or real), can be thought of as implementing some type of algorithm. At one end of the scale, mechanical systems perform a local search which must always lead downwards, making it very likely that they will get stuck in a local minimum. The addition of a stochastic component – in systems such as our “consistency computers” or Ising models – can help to some extent, but the complicated structure of the search space of the hardest NP-complete problems still means that they are very unlikely to find a solution. At the top of the scale lie neural networks and other systems – such as amorphous computers – with a similar degree of organisation. These are capable of 113 Chapter 6. Conclusions 114 integrating the information available in a problem in a principled way, analogous to Belief Propagation, and are therefore “smarter” than systems which are only capable of myopic local search. In addition, they are capable of “learning” by discovering heuristics which may work on at least a subclass of problems. In the end though, sadly, the hardest NP-complete problems appear so random that there are simply no cues for algorithm or physical system to use to gain a foothold. We are reluctantly compelled to conclude that, short of a miraculous insight, there is simply no way of solving these problems efficiently. At first sight, quantum systems appear to have the capacity to turn all this on its head by performing an unlimited number of calculations in parallel. The problem is that the amount of information which can be read out at the end is no more than for a classical system, unless we want to throw an exponential amount of energy at the task. All is not lost, however: one thing that novel computing models, notably quantum ones, can do is to give us more tools with which to attack problems. The only case in which an exponential speedup proved to be possible – Shor’s algorithm – shows us how new tools can be combined with insight into the problem to great effect. It is likely that such tools will always be rare, and of limited use, but we have almost certainly not found them all yet. After all, it is only when a tool is needed that we come to realise that it is even there; for now, we have no way of knowing what other features of quantum, and perhaps even classical, systems may ultimately turn out to be of use. For that matter, we have no reason to suppose that our knowledge of physics is even close to complete: new discoveries may bring yet more possibilities. 6.1 Possibilities for further work Given the rather negative conclusions of this study, we would recommend diverting attention away from the exact solution of hard problems in favour of stochastic or approximate solutions. Although we have said above that most physical sys- Chapter 6. Conclusions 115 tems do not implement very powerful algorithms, they do potentially work very fast. There is therefore scope in determining exactly how fast they can work, and how their performance depends on the type of problem. While they will not necessarily work well for all problems, or even over the full parameter space of one problem, there may well be niches for which their performance makes them a useful alternative to conventional computers. The physical computers considered here have generally had to be built specifically for the purpose of solving one problem. In the case of the route-finder, the range of problems it could solve would arguably justify its construction, but most of the other devices would be hard to justify from a commercial viewpoint. By adding adjustable components – such as variable resistors and junctions in the case of the electrical circuit machines – their utility could be extended, especially if the adjustments could be put under the control of a conventional computer. In other words, it could be worthwhile to investigate the utility of novel computing devices as special-purpose “co-processors”. Appendix A Miscellaneous quantum questions A.1 Quantum “nonlocality” and the EPR experiment Quantum theory is, like all physical theories, local. The Schrödinger equation, after all, is a straightforward partial differential equation. This was frequently questioned by Einstein, who was deeply unhappy with the idea of wavefunction collapse. Together with Podolsky and Rosen, he developed the famous EPR thought experiment, in an attempt to show the absurdity of the theory. The experiment begins with a stationary atom in an excited state, which has spin zero. The atom’s decay process induces it to emit two electrons and finish in its ground state, still stationary and with spin zero. Conservation of momentum and spin then imply that the electrons must have been emitted in opposite directions and with opposite spins; quantum mechanics implies that, until they are measured, the electrons are in the joint “entangled” state | ↑↓i + | ↓↑i. In other words, they are in a superposition of two states: either the first electron is spinning up and the second is spinning down, or the first electron is spinning down and the second is spinning up. On measurement, we have to find one or other of these states, but this implies 116 Appendix A. Miscellaneous quantum questions 117 that measuring the spin of the first electron causes the second to also collapse into a definite state, regardless of how far apart the two electrons are. This looks suspiciously like action at a distance, though of course it is not, because there is no way of exploiting this to allow instantaneous transmission of information. In most important respects, this situation is exactly the same as one where the electrons did have definite spins, but these were unknown until they were measured; two versions of the universe are overlaid by the superposition, and measurement merely picks one. The reason we raise this well-known experiment is that an apparently small change can throw the whole question back into doubt again. Quantum mechanics is strictly linear, in the sense that solutions to the wave equation can be freely superposed, but non-linear extenstions have been considered by some. An interesting result that comes of this is that such an extension could allow a Grover-style algorithm to converge much more quickly, and hence allow the solution of NPcomplete problems in polynomial time. Another consequence, however, is that action at a distance does become possible, either in the sense described above or via communication between different “worlds”. The reason why this is possible, despite no apparent non-locality being directly introduced, is because the so-called no cloning theorem is violated. This theorem – true for ordinary quantum mechanics – states that no quantum state can be precisely copied. In non-linear theories, copying is possible, and this allows the participants in an EPR experiment to communicate. For example, the person who measures the first electron could choose to make the measurement either in the vertical or the horizontal plane, giving the second electron a definite spin value in one of these planes (but a random value if it is measured in the other plane). The person who measures the second electron can then determine which plane was used by making a number of copies of their electron and measuring half in each plane. The plane for which all the measurements match is the plane used by the first participant. In this way, we have achieved instantaneous commmunication, though admittedly just of a single bit of information. Appendix A. Miscellaneous quantum questions 118 For us, the important feature of this story is that the modification that allowed the efficient solution of NP-complete problems also introduced non-locality into the theory, implying that non-local theories can have greater computational power than local ones. It is particularly easy to see how communication by “Everett telephone” could speed the solution of a problem: simply create enough “worlds” to allow all possibilities to be tried, then ask the person who finds the answer to communicate it to two of the other worlds, and ask the recipients to communicate it to two others etc. In this way, a carefully designed “chain letter” could communicate the answer to all the worlds in polynomial time. A.2 Are space and time discrete? As we divide matter up into smaller and smaller pieces, quantum mechanics becomes a more and more appropriate tool with which to examine them. One of the distinctive features of quantum mechanics (the one, in fact, from which it derives its name) is that many things turn out to take only discrete values: electrons can spin either up or down, light comes in discrete packets with precise energies, and so on. Everywhere you look, quantities turn out to be multiples of a definite smallest unit. Given this, it is not too farfetched to ask whether space and time have smallest units as well; indeed, this is considered by many to be required for general relativity to be unified with quantum mechanics to produce a sensible theory of quantum gravity. If space, time, and all the entities defined on them turn out to take only discrete values, it is a straightforward matter to choose units such that the simulation discussed above can be made precise. The smallest units of space and time are tiny – 10−35 m and 10−43 s – but the important point is that they are not zero; the slowdown involved in such a simulation would be prohibitive, but it would be polynomial. This means that a discrete spacetime would end up playing the rôle that we have otherwise attributed to Heisenberg’s uncertainty relations: the minimum unit of Appendix A. Miscellaneous quantum questions 119 space quoted above is also the minimum distance which can be observed (i.e. the minimum distance which can be identified as being different from zero); below this, the intrinsic uncertainty in the measurements swamps the measurement itself. As a result, even if space and time are not discrete, the requirements of reliable computation are likely to oblige us to treat them as if they were. Bibliography [1] Beals R, Buhrman H, Cleve R, et al., Quantum lower bounds by polynomials, J ACM 48 778–797 (2001) [quant-ph/9802049] [2] Deutsch D and Jozsa R, Rapid Solution of Problems by Quantum Computation, Proc. Roy. Soc. Lond. A 439 553–558 (1992) [3] Simon D, On the power of quantum computation, SIAM Journal on Computing 26 (5) 1474–1483 (1997) [4] Shor, P W, Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer , SIAM Journal on Computing 26 (5) 1484–1509 (1997) [quant-ph/9508027] [5] Grover L K, A fast quantum mechanical algorithm for database search, Proceedings of 30th ACM STOC 212–219 (1996) [quant-ph/9605043] [6] Bennett C H, Bernstein E, Brassard G, et al., Strengths and weaknesses of quantum computing, SIAM Journal on Computing 26 (5) 1510–1523 (1997) [quant-ph/9701001] [7] Boyer M, Brassard G, Høyer P, et al., Tight bounds on quantum searching, Fortschritte der Physik 46 (4–5) 493–505 (1998) [9605034] [8] Zalka C, Grover’s quantum searching algorithm is optimal , Phys. Rev. A 60 2746–2751 (1999) [quant-ph/9711070] 120 Bibliography 121 [9] Wolfram S, Undecidability and Intractability in Theoretical Physics, Phys. Rev. Lett. 54 (8) 735–738 (1985) http://www.stephenwolfram.com/ publications/articles/physics/85-undecidability/index.html [10] Mobilia M, Does a Single Zealot Affect an Infinite Group of Voters? , Phys. Rev. Lett. 91 028701 (2003) [cond-mat/0304670] [11] Adleman LM, Molecular Computation of Solutions to Combinatorial Problems, Science 266 (5187) 1021–1024 (1994) [12] Saul LK and Kardar M, Exact integer algorithm for the 2d ±J Ising spin glass, Phys. Rev. E 48 3221–3224 (1993) The 2d ±J Ising spin glass: exact partition functions in polynomial time, Nucl. Phys. B 432 641–667 (1994) [13] Braich RS, Chelyapov N, Johnson C, Rothemund PWK and Adleman LM, Solution of a 20-variable 3-SAT Problem on a DNA Computer , Science 296 499–502 (2002) [14] Kobayashi S, Horn clause computation with DNA molecules, J. Combin. Opt. 3 (2–3) 277–299 (1999) [15] Kaye R, Minesweeper is NP-complete, Math. Intell. 22 (2) 9–15 (2000) [16] Coppersmith D, An Approximate Fourier Transform Useful in Quantum Factoring, IBM Research Report RC 19642 (1994) [quant-ph/0201067] [17] Stewart I, The Dynamics of Impossible Devices, Nonlinear Science Today 1:4 8–9 (1991) [18] LLoyd S, Quantum search without entanglement, Phys. Rev. A 61 (1) 010301 (2000) [19] Walmsley I et al., http://www.rochester.edu/pr/News/NewsReleases/ scitech/walmsleyquantum.html [20] Lloyd S, Rahn B and Ahn C, Robust quantum computation by simulation, [quant-ph/9912040] 122 Bibliography [21] Manz A et al., Imperial College Reporter 119 (2002) http://www. imperial.ac.uk/P3476.htm [22] Steane A, Quantum computing, Rept. Prog. Phys. 61 117–173 (1998) [quantph/9708022] [23] Feynman RP, Simulating physics with computers, Int. J. Theor. Phys. 21 467–488 (1982) Quantum mechanical computers, Found. Phys. 16 507–531 (1986) [24] Benioff P, Quantum-mechanical Hamiltonian models of discrete processes that erase their own histories – application to Turing-machines, Int. J. Theor. Phys. 21 (3–4) 177–201 (1982) Quantum-mechanical models of Turingmachines that dissipate no energy, Phys. Rev. Lett. 48 (23) 1581–1585 (1982) Quantum-mechanical Hamiltonian models of Turing-machines, J. Stat. Phys. 29 (3) 515–546 (1982) [25] Deutsch D, Quantum theory, the Church-Turing principle and the universal quantum computer , Proc. R. Soc. Lond. A 400 97–117 (1985) [26] Bransden BH and Joachain CJ, Introduction to Quantum Mechanics, Longman, Harlow, Essex (1989) [27] Preskill J, Caltech Quantum Computation course, lecture notes, http:// www.theory.caltech.edu/people/preskill/ph229/#lecture [28] Vergis A, Steiglitz K and Dickinson B, The complexity of analog computation, Math. Comp. Sim. 28 91–113 (1986) [29] da Silva Graça D, The General Purpose Analog Computer and Recursive Functions over the Reals, Universidade Técnica de Lisboa, MSc Thesis (2002) [30] Shannon C, Mathematical theory of the differential analyser , J. Math. Phys. MIT 20 337–354 (1941) [31] Pour-El MB, Abstract computability and its relation to the general purpose analog computer , Trans. Amer. Math. Soc. 199 1–28 (1974) Bibliography 123 [32] Campagnolo ML, Computational complexity of real valued recursive functions and analog circuits, Universidade Técnica de Lisboa, PhD Thesis (2001) Iteration, Inequalities, and Differentiability in Analog Computers, J. Complex. 16 642–660 (2000) [33] Rubel LA, Some mathematical limitations of the general-purpose analog computer , Adv. Appl. Math. 9 22–34 (1988) [34] Siegelmann HT and Fishman S, Analog Computation with Dynamical Systems, Physica D 120 214–235 (1998) [35] Church J, Am. J. Math. 58 435 (1936) [36] Turing AM, Proc. Lond. Math. Soc. Ser. 2 442 230 (1936) [37] Penrose R, The Emperor’s New Mind , [38] Aspvall B and Shiloach Y, A Polynomial Time Algorithm for Solving Systems of Linear Inequalities with Two Variables per Inequality, SIAM J. Comput. 9 (4) 827–845 (1980) Aspvall B, Plass MF and Tarjan RE, A Linear-Time Algorithm for Testing the Truth of Certain Quantified Boolean Formulas, Info. Proc. Lett. 8 (3) 121–123 (1979) [39] Hayes B, Can’t get no satisfaction, Am. Sci. 85 (2) 108–112 (1997) [40] Cook S, The complexity of theorem-proving procedures, Conference Record of the Third Annual ACM Symposium on the Theory of Computing 151–158 (1971) [41] Kullmann O, New methods for 3-SAT decision and worst-case analysis, Theor. Comp. Sci. 223 1–72 (1999) [42] Dantsin E, Goerdt A etal, A deterministic (2 − 2/(k + 1))n algorithm for k-SAT based on local search, Theor. Comp. Sci. 289 (1) 69–83 (2002) [43] Schöning U, A probabilistic algorithm for k-SAT based on limited local search and restart, Algorithmica 32 (4) 615–623 (2002) 124 Bibliography [44] Papadimitriou CH, On selecting a satisfying truth assignment, Proc. of the Conference on the Foundations of Computer Science 163–169 (1991) [45] Selman B, Kautz H and Cohen B, Local Search Strategies for Satisfiability Testing, DIMACS Series in Discrete Mathematics and Theoretical Computer Science 26 (1996) http://www.cs.washington.edu/homes/kautz/papers/ dimacs93.ps [46] Selman B, Levesque HJ and Mitchell DG, A new method for solving hard satisfiability problems, Proc. AAAI-92, San Jose, CA 440–446 (1992) [47] Selman B, Mitchell DG and Levesque HJ, Generating hard satisfiability problems, Artificial Intelligence 81 (1–2) 17–29 (1996) [48] Johnson DS, Aragon CR McGeoch LA and Schevon C, Optimization by simulated annealing: an experimental evaluation; part II, graph coloring and number partitioning, Operations Research 39 (3) 378–406 (1991) [49] Hoos HH and Stützle T, Local Search Algorithms for SAT: An Empirical Evaluation, J. of Aut. Reas. 24 (4) 421–481 (2000) [50] Folino G, Pizzuti C and Spezzano G, Parallel Hybrid Method for SAT That Couples Genetic Algorithms and Local Search, IEEE Trans. on Evol. Comp. 5 (4) 323–334 (2001) [51] Pumphrey SJ, Solving the satisfiability problem using message-passing techniques, Part II Physics Project Report, Cambridge University (2001) http://www.inference.phy.cam.ac.uk/is/papers/pumphreySAT.pdf [52] Parisi G, A backtracking survey propagation algorithm for K-satisfiability, [cond-mat/0308510] [53] Garey MR and Johnson DS, Computers and Intractability: A Guide to the Theory of NP-Completeness, Freeman, New York (1979) Papadimitriou CH, Computational Complexity, Prentice-Hall, Eaglewood Cliffs, NJ (1982) [54] Mézard M, Parisi G and Zecchina R, Analytic and Algorithmic Solution of Random Satisfiability Problems, Science 297 812–815 (2002) Bibliography 125 [55] Mézard M, Parisi G and Virasoro MA, Spin Glass Theory and Beyond , World Scientific, Singapore (1987) [56] Martin OC, Monasson R and Zecchina R, Statistical methods and phase transitions in optimization problems, Theor. Comp. Sci. 265 3–67 (2001) [57] Davis M Logemann G and Loveland D, A machine program for theoremproving, Commun. ACM 5 394–397 (1962) [58] Mertens S, Computational complexity for physicists, Computing in Science and Engineering 4 (3) 31–47 (2002) [59] Mertens S, Random Costs in Combinatorial Optimization, Phys. Rev. Lett. 84 (6) 1347–1350 (2000) [60] Berger B and Leighton T, Protein folding in the hydrophobic-hydrophilic (HP) model is NP-complete, J. Comput. Biol. 5 (1) 27–40 (1998) Crescenzi P, Goldman D, Papadimitriouu C, Piccolboni A and Yannakakis M, On the complexity of protein folding, J. Comput. Biol. 5 (3) 423–465 (1998) [61] Fredkin E and Toffoli T, Conservative Logic, Int. J. Theor. Phys. 21 219–253 (1982) [62] Shor PW, Fault-tolerant error correction with efficitent quantum codes, Phys. Rev. Lett. 77 (15) 3260–3263 (1996) Kitaev AY, Fault-tolerant quantum computation by anyons, Ann. Phys. 303 (1) 2–30 (2003) [quant-ph/9707021] [63] Calderbank AR and Shor PW, Good quantum error-correcting codes exist, Phys. Rev. A 54 (2) 1098–1105 (1996) [64] Steane AM, Space, time, parallelism and noise requirements for reliable quantum computing, Fortschr. Phys. 46 (4–5) 443–457 (1998) [quant-ph/9708021] Preskill J, Reliable quantum computers, Proc. Roy. Soc. Lond. A 454 469–486 (1998) [quant-ph/9705031] [65] MacKay D, Mitchison G and McFadden P, Sparse Graph Codes for Quantum Error-Correction, [quant-ph/0304161] Bibliography 126 [66] Deutsch D, Ekert A and Lupacchini R, Machines, Logic and Quantum Physics, [math.HO/9911150] [67] Farhi E and Gutmann S, An Analog Analogue of a Digital Quantum Computation, [quant-ph/9612026] [68] Grover LK, From Schrödinger’s Equation to the Quantum Search Algorithm, Am. J. Phys. 69 (7) 769–777 (2001) [quant-ph/0109116] [69] Kastella K and Freeling R, Structured quantum search in NP-complete problems using the cumulative density of states, [quant-ph/0109087] [70] Brassard G, Høyer P, Mosca M and Tapp A, Quantum amplitude amplification and estimation, [quant-ph/0005055] Høyer P, On arbitrary phases in quantum amplitude amplification, Phys. Rev. A 62 052304 (2001) [quantph/0006031] [71] Lenstra AK and Lenstra HW Jr, Algorithms in Number Theory, in Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity, Elsevier, New York () [72] Spreeuw RJC, Classical wave-optics analogy of quantum-information processing, Phys. Rev. A 63 062302 (2001) A classical analogy of entanglement, Found. Phys. 28 361–374 (1998) Cerf NJ, Adami C and Kwiat PG, Optical simulation of quantum logic, Phys. Rev. A 57 R1477–R1480 (1998) Kwiat PG, Mitchell J, Schwindt P and White A, Grover’s search algorithm: an optical approach, J. Mod. Opt. 47 257–266 (2000) [73] Černý V, Quantum computers and intractable (N P -complete) computing problems, Phys. Rev. A 48 (1) 116–119 (1993) [74] Grover LK, Tradeoffs in the Quantum Search Algorithm, [quant-ph/0201152] [75] Grover LK and Sengupta A, From coupled pendulums to quantum search, [quant-ph/0109123] [76] Vanstone SA and van Oorschot PC, An Introduction to Error Correcting Codes with Applications, Kluwer Academic Publishers, Boston (1989) 127 Bibliography [77] MacKay DJC, Information Theory, Inference, and Learning Algorithms, http://www.inference.phy.cam.ac.uk/mackay/Book.html [78] Amit DJ, Modelling Brain Function, Cambridge University Press, Cambridge (1989) [79] Bush V, The differential analyzer: A new machine for solving differential equations, J. Franklin Institute 212 447–488 (1931) [80] Rubel LA, Digital Simulation of Analog Computation and Church’s Thesis, J. Symb. Logic 54 (3) 1011–1017 (1989) [81] Maass W, Networks of Spiking Neurons: The Third Generation of Neural Network Models, Electronic Colloquium on Computational Complexity (ECCC) 3 (031) [82] Siegelmann HT, Computation Beyond the Turing Limit, Science 268 (5210) 545–548 (1995) [83] Etessami K, Encoding and solving factoring with a SAT-solver , 2003 Informatics MSc project proposal [84] Berry MV, Quantal Phase Factors Accompanying Adiabatic Changes, Proc. Roy. Soc. Lond. A 392 (1802) 45–57 (1984) Quantum Chaology, Proc. Roy. Soc. Lond. A 413 (1844) 183–198 (1987) The Geometric Phase for Chaotic Systems, Proceedings: Mathematical and Physical Sciences 436 (1898) 631– 661 (1992) [85] Blum L, Shub M and Smale S, On a Theory of Computation and Complexity Over the Real Numbers – NP-completeness, Recursive Functions and Universal Machines, B. Am. Math. Soc. 21 (1) 1–46 (1989) Blum L, Cucker F, Shub M and Smale S, Complexity and Real Computation: A manifesto, Int. J. Bifurcat. Chaos 6 (1) 3–26 (1996) [86] See http://www.swiss.ai.mit.edu/projects/amorphous/ paperlisting.html and references therein. Bibliography 128 [87] McEliece RJ, A public-key cryptosystem based on algebraic coding theory, JPL DSN Progress Report 42–44 114–116 (1978) [88] Kabashima Y, Murayama T and Saad D, Cryptographical Properties of Ising Spin Systems, Phys. Rev. Lett. 84 9 (2000) [89] MacKay DJC, Bayesian Neural Networks and Density Networks, Nucl. Instrum. Meth. A 354 (1) 73–80 (1995) Probable Networks and Plausible Predictions – A Review of Practical Bayesian Methods for Supervised Neural Networks, Network-Comp. Neural 6 (3) 469–505 (1995) [90] Xu S, Doumen J and van Tilborg H, On the Security of Digital Signatures Schemes Based on Error-Correcting Codes, Designs Codes Cryptogr. 28 (2) 187–199 (2003) And finally. . . 129