Natural Computing Series Series Editors: G. Rozenberg (Managing) Th. BackA.E. Eiben J.N. Kok H.P. Spaink Leiden Center for Natural Computing .e. 1 Advisory Board: S. Amari G. Brassard M. Conrad K.A. De Jong C.C.A.M. Gielen T. Head 1. Kari 1. Landweber T. Martinetz Z. Michalewicz M.C. Mozer E.Oja J. Reif H. Rubin A. Salomaa M. Schoenauer H.-P. Schwefel D. Whitley E. Winfree J.M. Zurada Springer-Verlag Berlin Heidelberg GmbH Mika Hirvensalo Quantum Computing With 5 Figures Springer Author Mika Hirvensalo Department of Mathematics University of Turku 20014 Turku, Finland mikhirve@cs.utu.fi Series Editors G. Rozenberg (Managing Editor) Th. Băck, A.E. Eiben, J.N. Kok, H.P. Spaink Leiden Center for Natural Computing Leiden University Niels Bohrweg 1 2333 CA Leiden, The Netherlands rozenber@cs.leidenuniv.nl Library of Congress Cataloging-in-Publication Data Quantum computing p.cm. ISBN 978-3-662-04463-6 ISBN 978-3-662-04461-2 (eBook) DOI 10.1007/978-3-662-04461-2 1. Quantum computers. QA76.889.Q82 2001 2001020738 004.1--dc21 ACM Computing Classification ( 1998 ): F.l-2, G .1.2, G .3, H.l.l, 1.1.2, J.2 ISBN 978-3-662-04463-6 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of th1s publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag Berlin Heidelberg GmbH. Violations are liable for prosecution under the German Copyright Law. http://www.springer.de © Springer-Verlag Berlin Heidelberg 2001 Originally published by Springer-Verlag Berlin Heidelberg New York in 2001 Softcover reprint ofthe hardcover lst edition 2001 The use of general descriptive names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Cover Design: KiinkelLopka, Heidelberg Typesetting: Camera ready by the author Printed on acid-free paper SPIN 10723406 45/3142PS- 5 4 3 2 1 O Preface The twentieth century witnessed the birth of revolutionary ideas in the physical sciences. These ideas began to shake the traditional view of the universe dating back to the days of Newton, even to the days of Galileo. Albert Einstein is usually identified as the creator of the relativity theory, a theory that is used to model the behavior of the huge macrosystems of astronomy. Another new view of the physical world was supplied by quantum physics, which turned out to be successful in describing phenomena in the microworld, the behavior of particles of atomic size. Even though the first ideas of automatic information processing are quite old, I feel justified in saying that the twentieth century also witnessed the birth of computer science. As a mathematician, by the term "computer science", I mean the more theoretical parts of this vast research area, such as the theory of formal languages, automata theory, complexity theory, and algorithm design. I hope that readers who are used to a more flexible concept of "computer science" will forgive me. The idea of a computational device was crystallized into a mathematical form as a Turing machine by Alan Turing in the 1930s. Since then, the growth of computer science has been immense, but many problems in newer areas such as complexity theory are still waiting for a solution. Since the very first electronic computers were built, computer technology has grown rapidly. An observation by Gordon Moore in 1965 laid the foundations for what became known as "Moore's Law" - that computer processing power doubles every eighteen months. How far can this technical process go? How efficient can we make computers? In light of the present knowledge, it seems unfair even to attempt to give an answer to these questions, but some estimate can be given. By naively extrapolating Moore's law to the future, we learn that sooner or later, each bit of information should be encoded by a physical system of subatomic size! Several decades ago such an idea would have seemed somewhat absurd, but it does not seem so anymore. In fact, a system of seven bits encoded subatomically has been already implemented [37]. These small systems can no longer be described by classical physics, but rather quantum physical effects must be taken into consideration. When thinking again about the formalization of a computer as a Turing machine, rewriting system, or some other classical model of computation, one VI Preface realizes that the concept of information is usually based on strings over a finite alphabet. This strongly reflects the idea of classical physics in the following sense: each member of a string can be represented by a physical system (storing the members in the memory of an electronic computer, writing them on sand, etc.) that can be in a certain state; Le., contain a character of the alphabet. Moreover, we should be able to identify different states reliably. That is, we should be able to make an observation in such a way that we become convinced that the system under observation represents a certain character. In this book, we typically identify the alphabet and the distinguishable states of a physical system that represent the information. These identifiable states are called basis states. In quantum physical microsystems, there are also basis states that can be identified and, therefore, we could use such microsystems to represent information. But, unlike the systems of classical physics, these microsystems are also able to exist in a superposition of basis states, which, informally speaking, means that the state of such a system can also be a combination of basis states. We will call the information represented by such microsystems quantum information. One may argue that in classical physics it is also possible to speak about combinations of basis states: we can prepare a mixed state which is essentially a probability distribution of the basis states. But there is a difference between the superpositions of quantum physics and the probability distributions of classical physics: due to the interference effects, the superpositions cannot be interpreted as mixtures (probability distributions) of the basis states. Richard Feynman [30J pointed out in 1982 that it appears to be extremely difficult by using an ordinary computer to simulate efficiently how a quantum physical system evolves in time. He also demonstrated that, if we had a computer that runs according to the laws of quantum physics, then this simulation could be made efficiently. Thus, he actually suggested that a quantum computer could be essentially more efficient than any traditional one. Therefore, it is an interesting challenge to study quantum computation, the theory of computation in which traditional information is replaced by its quantum physical counterpart. Are quantum computers more powerful than traditional ones? If so, what are the problems that can be solved more efficiently by using a quantum computer? These questions are still waiting for answers. The purpose of this book is to provide a good introduction to quantum computation for beginners, as well as a clear presentation of the most important presently known results for more advanced readers. The latter purpose also includes providing a bridge (from a mathematician's point of view) between quantum mechanics and the theory of computation: it is not only my personal experience that the language used in research articles on these topics is completely different. Preface VII Chapter 1, especially Section 1.4, includes the most basic knowledge for the presentation of quantum systems relevant to quantum computation. Chapter 2 is divided as follows: Turing machines, as well as their probabilistic counterparts, are introduced in Section 2.1 as traditional models of computation. For a reader interested in the research on quantum computation but having little knowledge of the theory of computation, this section is designed to also include the basic definitions of complexity theory. In Section 2.2, the knowledge on quantum systems is deepened and the quantum counterparts of the computational models are presented. Sections 2.1 and 2.2 are intended for a reader who has a solid background in quantum mechanics, but little previous knowledge on classical computation theory. A reader who is well aware of the theory of computation may skip Section 2.1 as well: for such a reader the knowledge in Sections 2.2 and 2.3 (excluding the first subsection) is sufficient to follow this book. In Section 2.3, we represent Boolean and quantum circuits (as an extension of the concept of reversible circuits) as models of computation. Because of their descriptional simplicity, circuits are used throughout this book to present quantum algorithms. Chapter 3 is devoted to a complete representation of Shor's famous factorization algorithm. The instructions in Chapter 3 will help the reader to choose which sections may, according to reader's background, be skipped. Chapter 4 is closely connected to Chapter 3, and can be seen as a representation of a structural but not straightforward extension of Shor's algoritm. Chapter 5 is written to introduce Grover's method for obtaining a quadratic speedup in quantum computation (with respect to classical computation) in a very basic form, whereas in Chapter 6, we represent a method for obtaining lower bounds for quantum computation in a restricted quantum circuit model. Chapters 7 and 8 are appendices intended for a beginner, but Chapter 7 is also suitable for a reader who has a strong background in computer science and is interested in quantum computation. Chapter 8 is composed of various different topics in mathematics, since it has already turned out that, in the area of quantum computation, many mathematical disciplines, seemingly separate from each other, are useful. Moreover, my personal experience is that a basic education on computer science and physics very seldom covers all the areas in Chapter 8. This book is concentrated mainly on quantum algorithms but other interesting topics, such as quantum information theory, quantum communication, quantum error-correcting, and quantum cryptography, are not covered. Therefore, for additional reading, we can warmly recommend [32J by Josef Gruska. Book [32J also contains a large number of references to works on quantum computing. A reader who is more oriented to physics may also see [70J and [71J by C. P. Williams and S. H. Clearwater. It may also be useful to follow the Los Alamos preprint archive at http://xxx.lanl.gov /archive/quant-ph to learn about the new developments in quantum computing. The Los Alamos VIII Preface preprint archive contains a large number of articles on quantum computing since 1994, and includes many articles referred to in this book. For further information to this book I am maintaining a Web page, in particular for the collection of errata and any other comments from the readers. The link to this page is available via the Springer catalogue Web site: http://www.springer.de/cgi-bin/search_book.pl?isbn=3-540-66783-0 Acknowledgements. My warmest thanks go to Professors Grzegorz Rozenberg and Arto Salomaa for encouraging me to write this book, and to Professor Juhani Karhumiiki and Turku Centre for Computer Science for providing excellent working conditions during the writing period. This work has been supported by the Academy of Finland under grants 14047 and 44087. I am also indebted to Docent P. J. Lahti, V. Halava, and M. Ronkii for their careful revision work on parts of this book. Thanks also go to Dr. Hans Wossner for a fruitful cooperation. Turku, Finland, February 2001 Mika Hirvensalo Contents 1. Introduction.............................................. 1.1 A Brief History of Quantum Computation. . . . . . . . . . . . . . . . . 1.2 Classical Physics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Probabilistic Systems ................................... 1.4 Quantum Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 4 7 2. Devices for Computation.... .. .. ... . . .. . . .. . . .. .. . . . . . . .. 2.1 Classical Computational Models. . . . . . . . . . . . . . . . . . . . . . . . .. 2.1.1 TUring Machines ................................. 2.1.2 Probabilistic TUring Machines. . . . . . . . . . . . . . . . . . . . .. 2.1.3 Multitape TUring Machines. . . . . . . . . . . . . . . . . . . . . . .. 2.2 Quantum Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2.2.1 Quantum Bits ................................... 2.2.2 Quantum Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2.2.3 More on Quantum Information ................... " 2.2.4 Quantum TUring Machines. . . . . . . . . . . . . . . . . . . . . . .. 2.3 Circuits............................................... 2.3.1 Boolean Circuits ................................. 2.3.2 Reversible Circuits ............................... 2.3.3 Quantum Circuits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 13 13 13 16 20 21 22 24 27 30 34 34 36 38 3. Fast Factorization ........................................ 3.1 Quantum Fourier Transform. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3.1.1 General Framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3.1.2 Hadamard-Walsh Transform. .. . ... .. . ... . . . . .. .... 3.1.3 Quantum Fourier Transform in Zn. . . . . . . . . . . . . . . . .. 3.1.4 Complexity Remarks. .. . . .. . . . ... . . . .. . . . . .. .. . . .. 3.2 Shor's Algorithm for Factoring Numbers. . . . . . . . . . . . . . . . . .. 3.2.1 From Periods to Factoring. . . . . . . . . . . . . . . . . . . . . . . .. 3.2.2 Orders of the Elements in Zn ...................... 3.2.3 Finding the Period. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3.3 The Correctness Probability. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3.3.1 The Easy Case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3.3.2 The General Case ............................ . . .. 41 41 41 43 44 49 50 50 52 54 56 56 57 X Contents 3.3.3 The Complexity of Shor's Factorization Algorithm. . .. 61 3.4 Excercises............................................. 62 4. Finding the Hidden Subgroup ............................ 4.1 Generalized Simon's Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . .. 4.1.1 Preliminaries.................................... 4.1.2 The Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4.2 Examples.............................................. 4.2.1 Finding the Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4.2.2 Discrete Logarithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4.2.3 Simon's Original Problem ......................... 4.3 Exercises.............................................. 63 64 64 65 69 69 69 70 70 5. Grover's Search Algorithm. ... .. ... . . .. . . . . .. . . .. . . .. . . .. 5.1 Search Problems ....................................... 5.1.1 Satisfiability Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 5.1.2 Probabilistic Search .............................. 5.1.3 Quantum Search with One Query. . . . . . . . . . . . . . . . .. 5.2 Grover's Amplification Method. . . . . . . . . . . . . . . . . . . . . . . . . .. 5.2.1 Quantum Operators for Grover's Search Algorithm. .. 5.2.2 Amplitude Amplification . . . . . . . . . . . . . . . . . . . . . . . . .. 5.2.3 Analysis of Amplification Method .................. 5.3 Utilizing Grover's Search Method. . . . . . . . . . . . . . . . . . . . . . . .. 5.3.1 Searching with Unknown Number of Solutions ....... 73 73 73 74 76 79 79 80 84 87 87 6. Complexity Lower Bounds for Quantum Circuits ......... 6.1 General Idea. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6.2 Polynomial Representations. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6.2.1 Preliminaries.................................... 6.2.2 Bounds for the Representation Degrees. . . . . . . . . . . . .. 6.3 Quantum Circuit Lower Bound. . . . . . . . . . . . . . . . . . . . . . . . . .. 6.3.1 General Lower Bound. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6.3.2 Some Examples .................................. 91 91 92 92 96 98 98 101 7. Appendix A: Quantum Physics ........................... 7.1 A Brief History of Quantum Theory ...................... 7.2 Mathematical Framework for Quantum Theory ............. 7.2.1 Hilbert Spaces ................................... 7.2.2 Operators ....................................... 7.2.3 Spectral Representation of Self-adjoint Operators ..... 7.2.4 Spectral Representation of Unitary Operators ........ 7.3 Quantum States as Hilbert Space Vectors .................. 7.3.1 Quantum Time Evolution ......................... 7.3.2 Observables ..................................... 7.3.3 The Uncertainty Principles ........................ 103 103 105 107 108 109 111 115 116 118 121 Contents 8. XI 7.4 Quantum States as Operators ............................ 7.4.1 Density Matrices ................................. 7.4.2 Subsystem States ................................. 7.4.3 More on Time Evolution .......................... 7.4.4 Representation Theorems .......................... 7.5 Exercises .............................................. 125 126 130 135 136 144 Appendix B: Mathematical Background .................. 8.1 Group Theory ......................................... 8.1.1 Preliminaries .................................... 8.1.2 Subgroups, Cosets ................................ 8.1.3 Factor Groups ................................... 8.1.4 Group Z~ ....................................... 8.1.5 Group Morphisms ................................ 8.1.6 Direct Product ................................... 8.2 Fourier 'Iransforms ..................................... 8.2.1 Characters of Abelian Groups ...................... 8.2.2 Orthogonality of the Characters .................... 8.2.3 Discrete Fourier 'Iransform ........................ 8.2.4 The Inverse Fourier 'Iransform ..................... 8.2.5 Fourier 'Iransform and Periodicity .................. 8.3 Linear Algebra ......................................... 8.3.1 Preliminaries .................................... 8.3.2 Inner Product .................................... 8.4 Number Theory ........................................ 8.4.1 Euclid's Algorithm ............................... 8.4.2 Continued Fractions .............................. 8.5 Shannon Entropy and Information ........................ 8.5.1 Entropy ......................................... 8.5.2 I.nformation ...................................... 8.6 Exercises .............................................. 145 145 145 146 147 149 151 153 154 154 156 158 160 161 161 161 164 167 167 168 175 175 179 181 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 183 Index . ........................................................ 187 1. Introduction 1.1 A Brief History of Quantum Computation In connection to computational complexity, it could be stated that the theory of quantum computation was launched in the beginning of the 1980s. A most famous physicist, Nobel Prize winner Richard P. Feynman, proposed in his article [30] which appeared in 1982 that a quantum physical system of R particles cannot be simulated by an ordinary computer without an exponential slowdown in the efficiency of the simulation. However, a system of R particles in classical physics can be simulated well with only a polynomial slowdown. The main reason for this is that the description size of a particle system is linear in R in classical physics, l but exponential in R according to quantum physics (In Section 1.4 we will learn about the quantum physics description). Feynman himself expressed: But the full description of quantum mechanics for a large system with R particles is given by a function 'IjJ(XI, X2, ••• , XR, t) which we call the amplitude to find the particles Xl, ... , XR, and therefore, because it has too many variables, it cannot be simulated with a normal computer with a number of elements proportional to R or proportional to N. [30) Feynman also suggested that this slowdown could be avoided by using a computer running according to the laws of quantum physics. This idea suggests, at least implicitly, that a quantum computer could operate exponentially faster than a deterministic classical one. In [30], Feynman also addressed the problem of simulating a quantum physical system with a probabilistic computer but due to interference phenomena, it appears to be a difficult problem. Quantum mechanical computation models were also constructed by Benioff [5] in 1982, but Deutsch argued in [23J that Benioff's model can be perfectly simulated by an ordinary computer. In 1985 in his notable paper [23], Deutsch was the first to establish a solid ground for the theory of quantum computation by introducing a fully quantum model for computation and giving the description of a universal quantum computer. Later, Deutsch also defined quantum networks in [24]. The construction of a universal quantum Turing machine was improved by Bernstein and Vazirani in [9]' where the lOne should give the coordinates and the momentum of each particle with required precision. 2 1. Introduction authors show how to construct a universal quantum 'lUring machine capable of simulating any other quantum 'lUring machine with polynomial efficiency. After the pioneering work of D. Deutsch, quantum computation still remained a marginal curiosity in the theory of computation until 1994, when Peter W. Shor introduced his celebrated quantum algorithms for factoring integers and extracting discrete logarithms in polynomial time [63]. The importance of Shor's algorithm for finding factors is well-known: reliability of the famous RSA cryptosystem designed for secret communications is based on the assumption that the factoring of large integers will remain an intractable problem, but Shor demonstrated that this is not true if one could build a quantum computer. However, the theory is far more developed than the practice: a large-scale quantum computer has not been built yet. A major difficulty arises from two somehow contradictory requirements. On the one hand the computer memory consisting of a microscopically small quantum system must be isolated as perfectly as possible to prevent destructive interaction with the environment. On the other hand, the "quantum processing unit" cannot be totally isolated, since the computation must carryon, and a "supervisor" should ensure that the quantum system evolves in the desired way. In principle, the problem that non-controllable errors occur is not a new one: in classical information theory we consider a noisy channel that may corrupt the messages. The task of the receiver is to extract the correct information from the distorted message without any additional information transmission. The classical theory of error-correcting codes [43] addresses this problem, and the elementary result of this theory originally, due to Claude Shannon, could be expressed as follows: for a reasonably erratic channel, there is a coding system of messages which allows us to decrease the error probability in the transmission as much as we want. It was first believed that a corresponding scheme for quantum computations was impossible even in theory, mainly because of the No-cloning Theorem [72], which states that quantum information cannot be duplicated exactly. However, ShOT demonstrated in his article [64] how we can construct error-correcting schemes for quantum computers, thus establishing the theory of quantum error-correcting codes. To learn more about this theory, the reader may consult [16], for example. In article [55], J. Preskill emphasizes that quantum error-correcting codes may someday lead to the construction of a large-scale quantum computer, but this remains to be seen. 1.2 Classical Physics Physics, as we can understand it today, is the theory of overall nature. This theory is naturally too broad to be profoundly accessed in a brief moment, so we are satisfied just to point out some essential features that will be 1.2 Classical Physics 3 important when learning about the differences between quantum and classical computation. As its very core, physics is ultimately an empirical science in the sense that a physical theory can be regarded valid only if the theory agrees with empirical observations. Therefore, it is not surprising that the concept of observables 2 has great importance in the physical sciences. There are observables associated with a physical system, like position and momentum, to mention a few. The description of a physical system is called the state of the system. Example 1.2.1. Assume that we would like to describe the mechanics of a single particle X in a closed region of space. The observables used for the system description are the position and the momentum. Thus, we can, under a fixed coordinate system, express the state of the system as a vector x = (Xl,X2,X3,Pl,P2,P3) E ]R6, where (Xl,X2,X3) describes the position and (Pl,P2,P3) the momentum. As the particle moves, the state of the system changes in time. The way in which classical mechanics describes the time evolution of the state is given by the Hamiltonian equations of motion: d 8 -d Xi = -8 H, t Pi where H = H(Xl,X2,X3,Pl,P2,P3) is an observable called the Hamiltonian function of the system. Suppose now that another particle Y is inserted into our system. Then the full description of the system is given as a vector (x, y) E ]R6 X ]R6, where x is as before, and y describes the position and the momentum of the second particle. To build the mathematical description of a physical system, it is sufficient to assume that all the observables take their values in a set whose cardinality does not exceed the cardinality of the real number system. Therefore, we can assume that the observables take real number values; if it is natural to think that a certain observable takes values in some other set A, we can always replace the elements of A with suitable real numbers. Keeping this agreement in mind, we can list the properties of a description of a physical system . • A physical system is described by a state vector (also called state) x E ]Rk for some k. The set of states is called the phase space of the system. The state is the full description of the dynamic properties of interest. This means that the state completely describes the properties whose development we actually want to describe. The other physical properties, such as electrical charge, temperature, etc., are not included in the state if they do 2. In classical mechanics, observables are usually called dynamic variables. 4 1. Introduction not alter or do not affect the properties of interest. It should be emphasized here that, in the previous example, we were interested in describing the mechanics of a particle, so the properties of interest are its position and momentum. However, it may be argued that if we investigate a particle in an electrical field, the electrical charge of the particle definitely affects its behaviour: the greater the charge, the greater the acceleration. But in this case, the charge is seen as a constant property of the system and is not encoded in the state . • The state depends on time, so instead of x we should actually write x(t) or Xt. If the system is regular enough, as classical mechanics is, a state also determines the future states (and also the past states). That is, we can find a time-dependency which is a function Ut such that x(t) = Ut(x(O)). This Ut , of course, depends on the system itself, as well as on the constant properties of the system. In our first example, Ut is determined by the Hamiltonian via the Newtonian equations of motion . • If two systems are described by states x and y, the state of the compound system consisting of both systems can be written as (x, y). That is, the description of the compound system is a Cartesian product of the subsystems. 1.3 Probabilistic Systems In order to be able to comment on some characteristic features of the representation of quantum systems, we first study the representation of a probabilistic system. A system admitting probabilistic nature means that we do not know for certain the state of the system, but we do know the probability distribution of the states. In other words, we know that system is in states Xl, ... , Xn with probabilities P1, ... , Pn that sum up to 1 (if there is a continuum of the possible states as in Example 1.2.1, we should have a probability distribution that integrates up to 1, but for the sake of simplicity, we will study here only systems having finitely many states). Notation (1.1) where Pi ~ 0 and P1 + ... + Pn = 1 stands for a probability distribution, meaning that the system is in state Xi with probability Pi. We also call distribution (1.1) a mixed state. Hereafter, states Xi are called pure states. Now make a careful distinction between the notations: Expression (1.1) does not mean the expectation value (also called the average) of the state (1.2) but (1.1) is only the probability distribution over states Xi. 1.3 Probabilistic Systems 5 Example 1.3.1. Tossing a fair coin will give head h or tail t with a probability of According to classical mechanics, we may think that, in principle, perfect knowledge about the coin and all circumstances connected to the tossing will allow us to determine the outcome with certainty. However, in practice it is impossible to handle all these circumstances, and the notation [h] + [t] for the mixed state of a fair coin reflects our lack of information about the system. !. ! ! Using an auxiliary system, we can easily introduce a probabilistic nature for any system. In fact, this is what we usually do in connection with probabilistic algorithms: the algorithm itself is typically thought to be strictly deterministic, but it utilizes random bits, which are supposed to be freely available. 3 Example 1.3.2. Let us assume that the time evolution of a system with pure states Xl. ... , xn also depends on an auxiliary system with pure states hand t, such that the compound system state (Xi, h) evolves during a fixed time interval into (Xh(i),h) and (Xi,t) into (Xt(i),t), where h,t: {l, ... ,n} f-t {I, ... ,n} are some functions. The auxiliary system with states hand t can thus be interpreted as a control system which indicates how the original one should behave. Let us then consider the control system in a mixed state Pl[h] + P2[t] (if Pl.P2 =/; we may call the control system a biased coin). The compound state can then be written as a mixture !, which evolves into a mixed state If the auxiliary system no longer interferes with the first one, we can ignore it and write the state of the first system as A control system in a mixed state is called a randomizer. Supposing that the randomizers are always available, we may even assume that the system under consideration evolves probabilistic ally and ignore the randomizer. Using a more general concept of randomizer, we can achieve notational and conceptual simplification by assuming that the time evolution of a system is not a deterministic procedure, but develops each state Xi into a distribution (1.3) 3 In classical computation, generating random bits is a very complicated issue. For further discussion on this topic, consult section 11.3 of [48]. 1. Introduction 6 such that Pli +P2i + .. .+Pni = 1 for each i. In (1.3), Pji is the probability that the system state Xi evolves into Xj. Notice that we have now also made the time discrete in order to simplify the mathematical framework, and that this actually is well-suited to the computational aspects: we can assume that we have instantaneous descriptions of the system between short time intervals, and that during each interval, the system undergoes the time evolution (1.3). Of course, the time evolution does not always need to be the same, but rather may depend on the particular interval. Considering this probabilistic time evolution, the notation (1.1) appears to be very handy: during a time interval, a distribution (1.4) evolves into + ... + Pnl [XnJ) + ... + Pn (PIn [XIJ + ... + Pnn [XnJ) = (Pl1PI + ... + PlnPn) [XIJ + ... + (PnIPI + ... + PnnPn) [XnJ = p~[XIJ + p~[X2J + ... + P~[Xn], where we have denoted p~ = PilPI + ... + PinPn' The probabilities Pi and p~ PI (Pl1 [XIJ are thus related by PJ) P2 ( ·· · p~ - (Pl1 Pl2 ... PIn) P21 P22 P2n (PI) P2 . . .. .. .. . ' . ' Pnl Pn2 ... Pnn . . ' . (1.5) Pn Notice that the matrix in (1.5) has non-negative entries and Pli + P2i + ... + Pni =-1 for each i, which guarantees that pi +P2+" .+p~ = PI +P2+·· '+Pn' A matrix with this property is called a Markov matrix. A probabilistic system with a time evolution described above is called a Markov chain. Notice that a distribution (1.4) can be considered as a vector with nonnegative coordinates that sum up to 1 in an n-dimensional real vector space having [xIJ, ... , [xnJ as basis vectors. The set of all mixed states (distributions) is a convex set,4 having the pure states as extremals. Unlike in the representation of quantum systems, the fact that the mixed states are elements of a vector space is not very important. However, it may be convenient to describe the probabilistic time evolution as a Markov mapping, i.e., a linear mapping which preserves the property that the coordinates are non-negative and sum up to 1. For an introduction to Markov chains, see [58]' for instance. 4 A set S of vectors is convex, if for all Xl, X2 E S and all PI, P2 2: 0 such that + P2 = 1 also PlXl + P2X2 E S. An element xES of a convex set is an extremal, if X = PlXl + P2X2 implies that either PI = 0 or P2 = 0_ PI 1.4 Quantum Mechanics 7 1.4 Quantum Mechanics In the beginning of the twentieth century, experiments on atoms and radiation physics gave strong support to the idea that there are physical systems that cannot be satisfactorily described even by using the Markov chain representation of the previous section. In Section 7.1 we will discuss these experiments in more detail, but here in the introductory chapter we are satisfied with only presenting the mathematical description of a quantum mechanical system. We would like to emphasize that this representation based on so-called state vectors which is studied here is not the most general representation of quantum systems. A general Hilbert space formalism of quantum mechanics is represented in Section 7.4. The advantage of a representation using only state vectors is that it is mathematically simpler than the more general one. To explain the terminology, we usually use "quantum mechanics" to mean the mathematical structure that describes "quantum physics" , which can be understood more generally. The quantum mechanical description of a physical system looks very much like the probabilistic representation (1.6) but still differs essentially from (1.6). In fact, in quantum mechanics a state of an n-level system is depicted as a unit-length vector in an n-dimensional complex vector space Hn (see Section 8.3). 5 We call Hn the state space of the system. To illustrate, let us choose an orthonormal basis {IX1),"" IXn)} for the state space Hn (we assume here that n is a finite number). This strange "ket"-notation Ix) is originally due to P. Dirac, and its usefulness will become apparent later. Now, any state of our quantum system can then be written as (1.7) where O!i are complex numbers called the amplitudes (with respect to the chosen basis) and the requirement of unit length means that 10!112 + 10!212 + ... + IO!nl 2 = 1. It should be immediately emphasized that the choice of the orthonormal basis {IX1),"" IXn)} is arbitrary but, for any such fixed basis, refers to a physical observable which can take n values. To simplify the framework, we do not associate any numerical values with the observables in this section, but we merely say that the system can have properties Xl, .•. , X n . Numerical values associated to the observables are handled in Section 7.3.2. The amplitudes O!i induce a probability distribution in the following way: the probability that a system in a state (1.7) is seen to have property Xi is IO!il 2 (we also say that the 5 A finite-dimensional vector space over complex numbers is an example of a Hilbert space. 8 1. Introduction probability that Xi is observed is lai). The basis vectors IXi) are called the basis states and (1.7) is referred to as a superposition of basis states. Mapping 'I/J(Xi) = ai is called the wave function with respect to basis IX1), ... , Ix n ). Even though this state vector formalism is simpler than the general one, it has one inconvenient feature: if Ix) ,Iy) E Hn are any states that satisfy Ix) = eilJ Iy), we say that states Ix) and Iy) are equivalent. Clearly equivalent states induce the same probability distribution over basis states (with respect to any chosen basis), and we usually identify equivalent states. Example 1.4.1. A two-level quantum mechanical system can be used to represent a bit, and such a system is called a quantum bit. For such a system we choose an orthonormal basis {IO), II)}, and a general state of the system is ao 10) + aliI) , where laol 2+ lal1 2= 1. The above superposition induces a probability distribution such that the probabilities that the system is seen to have properties o and 1 are laol 2and lal1 2respectively. What about the time evolution of quantum systems? The time evolution of a probabilistic system via Markov matrices is replaced by matrices with complex number entries that preserve quantity lal1 2+ ... + lan l2. Thus, the quantum system in a state evolves during a time interval into the state where amplitudes a!, ... , an and ai, ... , a} a2 ) ( ·· · a~ - 1n 1 (au a21 a12 a22 ... aa2n ) (a a2 ) . '. . ' . .. ... . anI an2 ... ann a~ . . , . are related by (1.8) an and lail2 + ... + la~12 = lal1 2+ ... + lan l2. It turns out that the matrices satisfying this requirement are unitary matrices (see Section 7.3.1 for details). Unitarity of a matrix A means that the transpose complex conjugate of A, denoted by A *, is the inverse matrix to A. Matrix A * is also called the adjoint matrix 6 of A. By saying that the time evolution of quantum systems is unitary, several authors mean that the evolution of quantum systems are determined by unitary matrices. Notice that unitary time evolution has very 6 Among physics-oriented authors, there is also a widespread tradition of denoting the adjoint matrix by At. 1.4 Quantum Mechanics 9 interesting consequences: unitarity especially means that the time evolution of a quantum system is invertible. If fact, (aI, ... , an) can be perfectly recovered from (a~, ... ,a~), since the matrix in (1.8) has an inverse." We interrupt the description of quantum systems for a moment to discuss the differences between probabilistic and quantum systems. As we have said before, a mixed state (probability distribution) (1.9) of a probabilistic system and a superposition (1.10) of a quantum system formally resemble each other very closely, and therefore it is quite natural if (1.10) can be seen as a generalization of (1.9). At first glance, the interpretation that (1.10) induces the probability distribution such that lail 2 is the probability of observing Xi may only seem like a technical difference. Can (1.10) also be interpreted to represent our ignorance such that the system in state (1.10) is actually in some state IXi} with a probability of lail 2 ? The answer is absolutely no. A fundamental difference can be found even by recalling that the orthonormal basis of Hn can be chosen freely. We can, in fact, choose an orthonormal basis Ix~}, ... , Ix~} such that so, with respect to the new basis, the state of the system is simply Ix~}. The new basis refers to another physical observable, and with respect to this observable, the system may have some of the properties x~, ... , x~. But in the state IxD, the system is seen to have property x~ with a probability of 1. Example 1.4.2. Consider a quantum system with two basis states t and h (of course this system is the same as a quantum bit, only the terminology is different for illustration). Here we call this system a quantum coin. We consider a time evolution 1 1 1 1 Ih} H V2 Ih} + V2 It} It} H V2 Ih} - V2 It} , which we here call a fair coin toss (verify that the time evolution is unitary). Notice that, beginning with either state Ih} or It}, the state after the toss is one of the above states on the right-hand side, and that both of them have the property that hand t will both be observed with a probability of Imagine then that we begin with state Ih} and perform the fair coin toss twice. After the first toss, the state is as above, but if we do not observe the system, the !. 10 1. Introduction state will be Ih) again after the second toss (verify). The phenomenon that t cannot be observed after the second toss clearly cannot take place in any probabilistic system. Remark 1.4.1. In quantum mechanics, a state (1.10) is a pure state: it contains the maximal information of the system properties. The mixed states of quantum systems are discussed in Section 7.4. Let us now continue the description of quantum systems by introducing two systems with basis states IX1), ... , IXn) and IY1), ... , IYm) respectively. As in the case of classical and probabilistic systems, the basis states of the compound system can be thought of as pairs (Ixi), IYj)). It is natural to represent the general states of the compound system as n n i=l j=l L L>~ij lXi, Yj), (1.11) where L:~=1 L:j=l laij 12 = 1. A natural mathematical framework for representations (1.11) is the tensor product of the state spaces of the subsystems, and now we will briefly explain what is meant by a tensor product. Let Hn and Hm be two vector spaces with bases {Xl' ... ' xn} and {Y1, ... , Ym}. The tensor product of spaces Hn and Hm is denot~d by Hn&JHm. Space Hn&JHm has ordered pairs (Xi,Yj) as a basis, thus Hn&JHm has dimension mn. We also denote (Xi, Yj) = Xi &J Yj and say that Xi &J Yj is the tensor product of basis vectors Xi and Yj. Tensor products of other vectors than basis vectors are defined by requiring that the product is bilinear: ( t aixi) &J i=l (f j=1 f (3jYj) = t ai(3j Xi &J Yj· i=lj=l Since Xi &J Yj form the basis of Hn &J H m , the notion of the tensor product of vectors is perfectly well established, but notice carefully that the tensor product is not commutative. In connection with representing quantum systems, we usually omit the symbol &J and use notations closer to the original idea of regarding Xi &J Y j as a pair (Xi, Y j): If (1.11) can be represented as n m n m i=l j=1 i=1 j=l = (tai IXi)) i=1 (f(3j IYj)), j=l 1.4 Quantum Mechanics 11 we say that the compound system is in decomposable state. Otherwise, the system state is entangled. It is plain to verify that the the notion "decomposable state" is independent of the bases chosen for each spaces. Since HI @ (Hrn @ Hn) is clearly isomorphic to (HI @ Hm) @ Hn, we can omit the parentheses and refer to this tensor product as HI@Hm @Hn- Thus, we can inductively define the tensor products of more than two spaces. Example 1.4.3. A system of m quantum bits is described by an m-fold tensor product of two-dimensional Hilbert space H 2 . Thus the system has basis states IX1) IX2) .. . Ix m), where (Xl, ... , Xm) E {O, l}m. The dimension of this space is 2m and the system of m qubits is referred to as the quantum register of length m. We conclude this section by listing the key features of a finite-state quantum system in state-vector formalism. • The basis states of an n-level quantum system form an orthonormal basis of the system state space, Hilbert space Hn. This basis can be chosen freely and it refers to a particular observable. • A state of a quantum system is a unit-length vector (1.12) These states are called superpositions of basis states, and they induce a probability distribution such that when one observes (1.12), a property Xi is seen with probability IO::iI 2 . • If two quantum systems are depicted by using state spaces Hn and Hm, then the state space of the compound system consisting of these two subsystems is described by tensor product Hm @ Hn. • The states of quantum systems change via unitary transformations. 2. Devices for Computation To study computational processes we have to fix a computational device first. In this chapter, we study Turing machines and circuits as models of computation. We use the standard notations of formal language theory and represent these notations now briefly. An alphabet is any set A. The elements of an alphabet A are called letters. The concatenation of sets A and B is a set AB consisting of strings formed of any element of A followed by any element of B. Especially, A k is the set of strings of length k over A. These strings are also called words. The concatenation WI W2 of words WI and W2 is just the word WI followed by W2. The length of a word w is denoted by Iwl or £( w) and defined as the number of the letters that constitute w. We also define A O to be the set that contains only the empty word f. having no letters, and A * = A 0 u A I U A 2 U ... is the set of all words over A. Mathematically speaking, A * is the free monoid generated by the elements of A, having the concatenation as the monoid operation and f. as the unit element. 2.1 Classical Computational Models 2.1.1 Turing Machines The very classical model to describe computational processes is the Turing machine, or TM for short. To give a description of a Turing machine, we fix a finite set of basic information units called alphabet A, a finite set of internal control states Q and a transition function 8: Q x A ~ Q x A x {-I, 0, I}. (2.1) Definition 2.1.1. A (deterministic) Turing machine M over alphabet A is a sixtuple (Q, A, 8, qo, qa, qr), where qo, qa, qr E Q are the initial, accepting, and rejecting states respectively and 8 as above. The reason for calling the Turing machine of the above definition deterministic is due to the form of the transition 8, which describes the dynamics (the computation) of the machine. Later we will introduce some other forms of Turing machines, but for now, we will give an interpretation for the above definition, mainly using the notations of [48]. As a configuration of a Turing 14 2. Devices for Computation machine we understand a triplet (ql,X,y), where ql E Q and X,y E A*. For a configuration c = (ql, x, y), we say that the Turing machine is in state ql and that the tape contains word xy. If word y also begins with letter aI, we say that the machine is scanning letter al or reading letter al. To describe the dynamics determined by the transition function 8, we write x = wla and y = alw2, where a,al E A and WI,W2 E A*. Thus, c can be written as c = (ql, wla, al W2), and the transition 8 defines also a transition rule from one configuration to another in a very natural way: if 8(ql' ad = (q2, a2, d), then a configuration can be transformed to depending on if d = -1, d = 0, or d = 1 respectively. This means that, under transition 8(ql, al) = (q2, a2, d), the symbol al being scanned is replaced with a2 (we say that the machine prints a2 to replace al); the machine enters state q2, and begins to scan the symbol to the left of the previously scanned symbol, continues scanning the same location, or scans the symbol to the right of the previously scanned symbol, depending on d E {-I, 0, I}. We also say that the Turing machine's read-write head moves to the left, remains stationary, or moves to the right. If 8 defines a transition from c to c', we write c f- c' and say that c' is a successor of c, c yields c', and that c is followed by c'. The transition from c to c' is called a computational step. The anomalous cases c = (ql, W, E) and c = (ql, E, w) (recall that E was defined to be the empty word having no letters at all) are treated by introducing a blank symbol u, extending the definition of 8 and replacing (ql, w, E) (resp. (ql, E, w)) with (ql, w, u), (resp. (ql, u, w)) if necessary. A computation of a Turing machine with input w E A* is defined as a sequence of configurations Co, CI, C2, ... such that Co = (qQ, E, w) and Ci I- Ci+l for each i. We say that the computation halts if some Ci has no successors, or if the state symbol of the configuration of Ci is either qa or qr' In the former case, the computation is accepting, and in the latter case it is rejecting. If a computation of a Turing machine T beginning with configuration (qQ, E, w) leads into a halting configuration (q, WI, W2) in t computational steps, we say that T computes WIW2 from the input word w in time t. Thus, a Turing machine can be seen as a device computing (partial) functions A* -+ A *; but, we can also ignore the output WI W2 and just say that the Turing machine T accepts the input w if it halts in an accepting state, and that T rejects the input w if it either does not halt or halts in a rejecting state. Thus, a Turing machine can also be seen as an acceptor that classifies the words into those which are accepted and those which are rejected. 2.1 Classical Computational Models 15 Definition 2.1.2. A set S of words over A is a recursively enumerable language if there is a Turing machine T such that T accepts w if and only if wE S. Definition 2.1.3. A set S is a recursive language if there is a Turing machine T such that each computation ofT halts, and T accepts w if and only ifw E S. Families of recursively enumerable and recursive languages are denoted by RE and R respectively. By definition, R ~ RE, but it is a well-known fact that also R =f. RE; see [48], for instance. It is a widespread belief that Turing machines capture the notion of algorithmic computability: whatever is algorithmically computable is also computable by a Turing machine. This strong belief should actually be called a hypothesis, which is known as the Church- Turing thesis. In this book we establish the notion of algorithm by using Turing machines. Turing machines provide a clear notion for computational resources like the time and the space used during computation. Moreover, by using Turing machines, it is easy to introduce the notions of probabilistic, nondeterministic and quantum computation. On the other hand, it is worth arguing that, for almost all practical purposes, the notion of the Turing machine is clumsy: even to describe very simple algorithms by using Turing machines requires a lengthy and fairly non-intuitive description of the transition rules. For the above reasons, we will represent the fundamental concepts of computability by using Turing machines, but we will also use more sophisticated notions of algorithms including Boolean circuits, quantum circuits and even pseudo-programming languages. The above definitions of recursive and recursively enumerable languages represent two classes of algorithmic decision problems. When problem is a decision problem, it means that we are given a word w as an input and we should decide if w belongs to some particular language or not. It is clear that, for recursive languages, when the Turing machine accepts the particular language, it offers an algorithm for solving this problem. However, for recursively enumerable languages the corresponding Turing machine does not provide a good algorithmic solution. The reason for this is that even though the machine halts when the input w belongs to S, we do not know how long the computation lasts on individual input words w. Thus, we do not know, how long we should wait until we know that the machine will not halt, i.e., the decision w 1:. S cannot be made. We call all decision problems inside R algorithmically solvable, recursively solvable, or decidable. The decision problems outside R are called recursively unsolvable or undecidable. The classification into recursively solvable and unsolvable problems is nowadays far too rough. A typical way to introduce some refinement is to consider the computational time required to solve a particular problem. Therefore, we now define some basic concepts connected to measuring the computation time. In complexity theory, we are usually interested in measuring 16 2. Devices for Computation the computation time only up to some multiplicative constant, and for that purpose, the following notations are useful: Let f and g be functions N -+ N. We write f = O(g), if there are constants c> 0 and no EN such that fen) ::; cg(n), whenever n ~ no. If fen) ~ cg(n) whenever n ~ no, we write f = neg). If f = O(g) and f = neg), then we write f = 8(g) and say that f and g are of the same order. Let M be a 'lUring machine that halts on each input and T( n) be the maximum computation time on inputs having length n. We say that T(n) is the time complexity function of M. A 'lUring machine M is a polynomialtime Turing machine if its time complexity function satisfies T(n) = O(nk) for some constant k. The family of decision problems that can be solved by polynomial-time 'lUring machines is denoted by P. Using widespread jargon, we say that the decision problems in Pare tmctable, while the other ones are intmctable. 2.1.2 Probabilistic Turing Machines We can modify the definition of a deterministic 'lUring machine to provide a model for probabilistic computation. The necessary modification is to replace the transition function with tmnsition probability distribution 8: Q x A x Q x A x {-1,0,1} -+ [0,1]. Value 8 (qI, a I , q2, a2, d) is interpreted as the pro babili ty that, when the machine in state ql reads symbol aI, it will print a2, enter the state q2, and move the head to direction d E {-I, 0,1}. Definition 2.1.4. A probabilistic Turing machine M over alphabet A is a sixtuple (Q,A,8,qo,qa,qr), where qo, qa, qr E Q are the initial, accepting, and rejecting states respectively. It is required that for all (ql, ad E Q x A ~ 8(ql,al,q2,a2,d) = 1. (Q2,a2,d)EQxAx {-I,O,I} From this time on, to avoid fundamental difficulties in the notion of computability, we will agree that all the values of 8(QI, aI, Q2, a2, d) are mtionaz.! The computation of a probabilistic 'lUring machine is not so straightforward a concept as the deterministic computation. We say that the configuration c = (QI,wla,alw2) yields (or is followed by) any configuration c' = (q2, WI, aa2w2) (resp. c' = (q2,wla,a2w2) and c' = (q2,wlaa2,w2)) with probability p = 8(QI, aI, q2, a2, 1) (resp. with probability p = 8(qI, aI, q2, a2, 0) and p = 8(ql, aI, q2, a2, -1)). If c yields c' with probability p, we write c f-p c'. Let Co, CI, ... , Ct be a sequence of configurations such that Ci f-Pi+l Ci+1 for 1 This agreement can, of course, be criticized, but the feasibility of a machine working with arbitrary real number probabilities is also questionable. 2.1 Classical Computational Models 17 each i. Then we say that Ct is computed from Co in t steps with probability PIP2 ... Pt. If PIP2 ... Pt i- 0, we also say that Co f-Pl CI f-P2 ..• f-P' Ct is a computation of a probabilistic Turing machine. Remark 2.1.1. A reader who thinks that the notion of probabilistic computation based on probabilistic Turing machines does not correspond very well to the idea of an algorithm utilizing random bits is perfectly right! In many ways a simpler model for probabilistic computations could have been given by using nondeterministic Turing machines. The reason for presenting this model is to make the traditional definition of a quantum Turing machine (see e.g. [9]) clearer. Unlike in a deterministic computation, a single configuration can now yield several different configurations with probabilities that sum up to 1. Thus, the total computation of a probabilistic Turing machine can be seen as a tree having configurations as nodes. The initial configuration is the root and each node C has configurations followed by C as descendants. Thus, a computation of the probabilistic machine in question is just a single branch in the total computation tree. All. computations can be expressed by the terms of the probabilistic systems we considered in the introductory chapter: assume that taking t steps, a probabilistic Turing machine starting from an initial configuration computes configurations CI, ..• , Cm with probabilities PI, ... , Pm such that PI + ... + Pm = 1. We can then say that the total configuration of a probabilistic machine at time t is a probability distribution (2.2) over the basis configurations cI, ... , em. As in the introductory chapter, we can define a vector space, which we now call the configuration space, that has all of the potential basis configurations as the basis vectors, and a general configuration (2.2) is a vector in the configuration space having non-negative coordinates that sum up to 1. On the other hand, there is now an essentially more complicated feature compared to the introductory chapter: there is a countable infinity of basis vectors, so our configuration space is countably infinite-dimensional. Regarding this latter representation of total configurations of a probabilistic Turing machine, the reader may already guess how to introduce the notion of quantum Turing machines, but we will still continue with probabilistic computations and study some acceptance models. When talking about time-bounded computation in connection with probabilistic Turing machines, we usually assume that all the computations have the same length. In other words, we assume that all the computations are synchronized so well that they reach a halting configuration at the same time, so that all the branches of the computation tree have the same length. We can thus regard a probabilistic Turing machine as a facility for computing the probability distribution of outcomes. For instance, if the purpose of a 18 2. Devices for Computation particular probabilistic machine is to solve a decision problem, then some of the computations may end up in an accepting state, but some may also end up in a rejecting state. What then do we mean by saying that a probabilistic machine accepts a string or a language? In fact, there are several different choices, some of which are given in the following definitions: • Class NP is the family of languages S that can be accepted in polynomial time by some probabilistic Turing machine M in the following sense: a word w is in the language S if and only if M accepts w with nonzero probability (see Remark 2.1.2). • Class RP is the family of languages S that can be accepted in polynomial time with some probabilistic Turing machine M in the following way: if w E S, then M accepts w with a probability of at least ~, but if w ~ S, then M always rejects w. • Class coRP is the class of languages consisting exactly of the complements of those in RP. Notice that coRP is not the complement of RP among all the languages. • We define ZPP = RP n coRP . • Class BPP is the family of languages S that are accepted by a probabilistic Turing machine M such that if wE S, then M accepts with a probability of at least ~, and if w ~ S, then M rejects with a probability of at least ~. Remark 2.1. 2. The definition of the class NP (standing for nondeterministic polynomial time) given here is not the usual one. A much more traditional definition would be given by using the notion of a nondeterministic Turing machine, which can be obtained from the probabilistic ones by ignoring the probabilities. In other words, a nondeterministic Turing machine has a transition relation 8 ~ Q x A x Q x A x {-1,0, I}, which tells whether it is possible for a configuration c to yield another configuration c/. More precisely, the fact that (ql, al, q2, a2, d) E 8 means that, if the machine is in state ql scanning symbol al, then it is possible to replace al with a2, move the head to direction d, and enter state q2. However, this model resembles the probabilistic computation very closely: indeed, the notion of "computing c' from c in t steps with nonzero probability" is replaced with the notion that "there is a possibility to compute c' form c in t steps" , which does not seem to make any difference. The acceptance model for nondeterministic Turing machines also looks like the one that we defined for probabilistic NP-machines. A word w is accepted if and only if it is possible to reach an accepting final configuration in polynomial time. It can also be argued that the class NP does not correspond very well to our intuition of practical computation. For example, if each configuration yields two distinct ones each with a probability of ~ and the computation lasts t steps, there are 2t final configurations, each computed from the initial one with a probability of ~. However, we say that the machine accepts a word if and only if at least one of these final configurations is accepting, but 2.1 Classical Computational Models 19 it may happen that only one final configuration is accepting, and we cannot distinguish the acceptance probabilities ~ (accepting) and 0 (rejecting) practically without running the machine il(2t) times. But this would usually make the computation last exponential time since, if the machine reads the whole input, then t ~ n, where n is the length of the input. By its very definition, each deterministic Turing machine is also a probabilistic one (always working with a probability of 1), and therefore P ~ NP. However, it is a long-standing open problem in theoretical computer science whether P=lNP. Remark 2.1.3. Class RP (randomized polynomial time) corresponds more closely to the notion of realizable effective computation. If w ct S, then this fact is revealed by any computation. On the other hand, if w E S we learn this with a probability of at least ~. This means that an RP-Turing machine has no false positives, but it may give a false negative, however, with a probability of less than ~. By repeating the whole computation k times, the probability of making false decision w ct S can thus be reduced to at most ~. A probabilistic Turing machine with RP-style acceptance is also called Monte Carlo Turing machine. Remark 2.1.4. If a language S belongs to ZPP, then there are Monte Carlo Turing machines Ml and M2 accepting S and the complement of S respectively. By combining these two machines, we obtain an algorithm that can be repeatedly used to make the correct decision with certainty. For if Ml gives answer w E S one time, it surely means that w E S since Ml has no false positives. If M2 gives an answer wE A* \ S, then we know that w ct S since M2 has no false positives either. In both cases, the probability of a false answer is at most ~, so by repeating the procedure of running both machines k times, we obtain with a probability of at least 1 - ~ the certainty that either wE S or w ct S. Thus, we can say that a ZPP-algorithm works like a deterministic algorithm whose expected running time is polynomial. Notation ZPP stands for Zero error Probability in Polynomial time. A ZPP-algorithm is called a Las Vegas-algorithm. Remark 2.1.5. The definition of BPP merely contains the idea that we accept languages or, to put it in other words, that we solve decision problems using a probabilistic Turing machine that is required to give a correct answer with a probability that is larger than ~. Notation BPP stands for Bounded error Probability in Polynomial time. The constant ~ is arbitrary and any c E (~, 1) would define the same class. This is because we can efficiently increase the success probability by defining a Turing machine that runs the same computation several times, and then taking the majority of results as the answer. It is also widely believed that the class BPP is the class of problems that are efficiently solvable. This belief is known as the Extended Church- Turing thesis. 20 2. Devices for Computation Earlier in this section it was mentioned that, according to the Church'lUring thesis, 'lUring machines are capable of capturing the notion of computability. However, it seems that probabilistic 'lUring machines with any of the above acceptance models also fit very well into the notion of "algorithmic computability". Is it true that any language which is accepted by a probabilistic 'lUring machine, regardless of the acceptance mode, can also be accepted by an ordinary 'lUring machine? The answer is yes, and it can be justified as follows: since the probabilities were required to be rational, we can always simulate the computation tree of a probabilistic 'lUring machine with an ordinary one that sequentially performs all possible computations and the probabilities associated to them. In other words, knowledge of the probabilities on how a probabilistic algorithm works also allows us to compute the probability distribution deterministically. 2 Therefore, the notion of probabilistic computation does not shatter the border between decidability and undecidability. However, any deterministic 'lUring machine can also be seen as a probabilistic machine having only one choice for each transition. Therefore, it is clear that P ~ RP, P ~ ZPP, and P ~ (2.3) BPP. The simulation of a probabilistic machine by a deterministic one is done by sequentially computing all the possible computations of a probabilistic machine. Since the number of different computations may be exponential (recall Remark 2.1.2), the si'mulation may also take exponential time. Hence, we can ask whether probabilistic computation is more efficient than deterministic computation, i.e., if some of the inclusions (2.3) are strict. Or is it true that there is a cunning way to imitate deterministically a probabilistic computation such that the simulation time would not grow exponentially? The answer to such questions are unknown so far. 2.1.3 Multitape Turing Machines A straightforward generalization of a 'lUring machine is a multitape Turing machine. This model of computation becomes apparent when speaking about space-bounded computation. Definition 2.1.5. A (deterministic) Turing machine with k tapes over an alphabet A is a sixtuple (Q, A, 15, qo, qa, qr), where qo, qa, qr E Q are the initial, accepting, and rejecting states respectively, and 15 : Q x Ak ~ Q x (A x {-I, 0, I} ) k is the transition function. 2 The restriction to rational probabilities plays an essential role here. This restriction can be criticized, but from the practical point of view, rational probabilities should be sufficient. Some authors allow all "computable numbers" as probabilities, but this would introduce more problems, such as: Which numbers are computable? Which is the computational model to "compute these numbers"? Can the comparisons of "computable numbers" (needed to decide, e.g., to decide whether the acceptance probability is at least be made efficiently? i) 2.2 Quantum Information 21 A configuration of a k-tape Turing machine is a 2k + I-tuple (2.4) where q E Q and Xi,Yi E A*. For a configuration (2.4) we say that q is the current state, and that XiYi is the content of the ith tape. Let Xi = Viai and Yi = biwi· Then we say that bi is the currently scanned symbol on the ith tape. If where r E Q, Ci E A, and di E {-1,0, I}, then the configuration C yields , "2'Y2, ... ,x"k 'Yk' ) were h (Xi,Yi ' ') = (Vi,aiCiWi, ) 1·f di -- - 1, C, = (r,x ,I 'YI,x (X~, yD = (Viai, CiWi), if di = 0, and (x~, y~) = (ViaiCi, Wi)' if d i = 1. The concepts associated with computing, such as acceptation, rejection, computational time, etc., are defined for multitape Turing machines exactly in the same fashion as for ordinary Turing machines. But when talking about the space consumed by the computation, it may seem fair to ignore the space occupied by the input and the space needed for writing the output, and merely talk about the space consumed by the computation. By using multitape machines, this is achieved as follows: we choose k > 2 and identify the first tape as an input tape and the last tape as an output tape. The machine must never change the contents of the input tape, and the head position in the output tape can never move to the left. If, using such a model for computation, one reaches a final configuration C = (q, Ul, VI, U2, V2, ... , Uk, Vk), then the space required during the computation is defined to be Z=:~2I IUiVil. Again, we may ask whether the computational capacity of multitape Turing machines is greater than that of ordinary ones. This time, we can even guarantee that the computation capacity of multitape machines is very close to the capacity of a single-tape machine: it is left as an exercise to show that whatever is computable by a multitape machine in time t is also computable by a single-tape Turing machine in time O(t 2 ). 2.2 Quantum Information It must be first noted that, despite the title, quantum information theory is not the topic of this section. Instead, we will merely concentrate on descriptional differences between presenting information by using classical and quantum systems. In fact, in this section we will present the fundamentals of quantum information processing. A reader well aware of basic linear algebra will presumably have no difficulties in following this section, but for a reader feeling some uncertainty, we recommend consulting Sections 7.1, 7.2, and the initial part of Section 7.3. Moreover, the basic notions of linear algebra are outlined in Section 8.3. 22 2. Devices for Computation 2.2.1 Quantum Bits Definition 2.2.1. A quantum bit, qubit for short, is a two-level quantum system. Because there should not be any danger of confusion, we also say that the two-dimensional Hilbert space H2 is a quantum bit. Space H2 is equipped with a fixed basis B = {IO), II)}, a so-called computational basis. States 10) and 11) are called basis states. A general state of a single quantum bit is a vector Co 10) + Cl 11) , (2.5) having unit length, i.e, lcol 2+ ICll2 = l. We say that an observation of a quantum bit in state (2.5) will give 0 or 1 as an outcome with probabilities of Ico 12 and ICll2 respectively. Definition 2.2.2. An operation on a qubit, called a unary quantum gate, is a unitary mapping U : H2 -+ H 2. In other words, a unary quantum gate defines a linear operation 10) 11) + b 11) f-t a 10) f-t clO) +dll) such that the matrix is unitary, i.e., C*) ( acdb) (a* b* d* = (1 0) 01· Notation a* stands for the complex conjugate of complex number a. That notation A * has already been used for the free monoid generated by the set A should cause no misunderstandings, but the meaning of a particular *symbol should be clear by the context. In what follows, notation (a, bf is used to indicate the transposition, i.e., Example 2.2.1. Let us use the coordinate representation 10) 11) = (0, I)T. Then the unitary matrix (I,O)T and M~ = (~~) defines an action M~ 10) = 11), M~ 11) by M~ is called a quantum not-gate. = 10). A unary quantum gate defined 2.2 Quantum Information 23 Example 2.2.2. Let us examine a quantum gate defined by a unitary matrix JM~= ( ill I-i) l~i ~ . The action of ..;x;L. is JM~ 10) = 1 ~ i 10) + 1; iiI), (2.6) JM~ 11) = 1; i 10) + 1~ iiI). (2.7) Since observations of (2.6) and (2.7) will give 0 or 1 as the outcome, both with a probability of~. Because ~~ = M~, gate ~ is called the square root of the not-gate. Example 2.2.3. Let us study a quantum gate W 2 defined by matrix The action of W2 is 1 1 1 1 W2 10) = J2 IO) + J2 11), W2 10) = J2 IO) - J21 1). Matrix W 2 is called a Walsh matrix, Hadamard matrix or Hadamard- Walsh matrix and will eventually appear to be very useful. A very important feature of quantum gates that has been already mentioned implicitly is that they are linear, and therefore it suffices to know the action on the basis states. For example, if a Hadamard-Walsh gate acts on a state ~(IO)+ll)), the outcome is 24 2. Devices for Computation 2.2.2 Quantum Registers A system of two quantum bits is a four-dimensional Hilbert space H4 H2 0 H2 having orthonormal basis {IO) 10),10) 11),11) 10),11) II)}. We also write 10) 10) = 100), 10) 11) = 101), etc. A state of a two-qubit system is a unit-length vector Co 100) + Cl 101 ) + c2110) + c3111) , (2.8) so it is required that 1co1 2 + ICll2 + hl 2 + IC312 = 1. Observation of a two-qubit system in state (2.8) will give 00, 01, 10, and 11 as an outcome with probabilities of Icol2, IClI2, h1 2, and IC312 respectively. On the other hand, if we choose to observe only one of the qubits, the standard rules of the probabilities will apply. This means that an observation of the first (resp. second) qubit will give 0 or 1 with probabilities of Icol2 + ICll2 and hl 2 + hl 2 (resp. Icol2 + hl 2 and ICll2 + IC312). Remark 2.2.1. Notice that here the tensor product of vectors does not commute: 10) 11) f:. 11) 10). We use linear ordering (write from left to right) to address the qubits individually. Recall that the state z E H4 of a two-qubit system is decomposable if z can be written as a product of states in H 2 , z = X 0 y. A state that is not decomposable is entangled. Example 2.2.4. State !(IOO) + 101) + 110) + 111)) is decomposable, since ~(IO) 10) + 10) 11) + 11) 10) + 11) 11)) = ~(l0) + 11)) ~(IO) + 11)), 2 v2 as easily verified. On the other hand, state ~(IOO) see this, assume on the contrary, that 1 y'2(100) + 111)) = (ao 10) v2 + 111)) is entangled. To + alI1))(bo 10) + bl I1)) = aobo 100) + aobl 101) + albo 110) + alb l 111) for some complex numbers ao, al, bo , and b1 . But then aobo = ~ { aobl = 0 albo = 0 albl = ~, which is absurd. A binary quantum gate is a unitary mapping H4 -t H 4 . To give an example of such a gate, we again use coordinate representation 100) = (l,O,O,O)T, 101) = (0, 1,0,of, 110) = (0,0, 1,of, and 111) = (0,0,0, If· 2.2 Quantum Information 25 Remark 2.2.2. If two qubits are in an entangled state ~(IOO) + 111)), then, observing one of them, will give 0 or 1, both with a probability of but it is not possible to observe different values on the qubits. It is interesting to notice that the experiments have shown that this correlation can remain even if the qubits are spatially separated more than 10 km [671. This distant correlation opens opportunities for quantum cryptography and quantum communication protocols. A pair of qubits in state ~(IOO) + 111)) is called an EPR-pair. 3 Notice that if both qubits of the EPR-pair are run through a Hadamard gate, the resulting state is again ~(IOO) + 111)), so it is impossible to give an ignorance interpretation for the EPR-pair. !, Example 2.2.5. Matrix 1000) 0100 ( Mcnot = 0001 0010 defines a unitary mapping, whose action on the basis states is Mcnot 100) = 100), Mcnot 101) = 101), Mcnot 110) = 111), Mcnot 111) = 110). Gate Mcnot is called controlled not, since the second qubit (target qubit) is flipped if and only if the first (control qubit) is 1. We will give another example of a binary quantum gate after the following important definition. Definition 2.2.3. The tensor product, also called the Kronecker product, of r x sand t x u matrices au a21 ( A= .. . arl al2 ... alS) a22 ... a2s '. . .. . '.. and B = ( a r 2 ... ars bu bl2 ... blU) b21 b22 ... b2u ·· . . .. · . '. '. btl b t2 ... btu is an rt x su matrix defined as A®B= ( aUB al2B ... alsB) a21 B a22 B ... a2s B ... ' . .. . .. '. . arlB ar2B ... arsB If MI and M2 are 2 x 2 matrices that describe unary quantum gates, then it is easy to verify that the joint actions of MI on the first qubit and M2 on 3 EPR stands for Einstein, Podolsky and Rosen, who first regarded the distant correlation as a source of a paradox of quantum physics [69J. 26 2. Devices for Computation the second one are described by the matrix M 1 0M2 . This also generalizes to quantum systems of any size: if matrices Ml and M2 define unitary mappings on Hilbert spaces Hn and Hm, then the nm x nm matrix Ml 0 M2 defines a unitary mapping on space Hn 0 Hm. The action of Ml 0 M2 is exactly the same as the action of Ml on Hn , followed by the action of M2 on Hm (or vice versa). In particular, mapping Ml 0 M2 cannot introduce entanglement between systems Hn and Hm· Example 2.2.6. Let Ml = M2 = W 2 be the Hadamard matrix of the previous section. Action on both qubits with a Hadamard-Walsh matrix can be seen as a binary quantum gate, whose matrix is defined by W4 = W 2 0 W 2 1 ="2 11 -11 1-1 1 1) ( 1 1 -1 -1 ' 1 -1 -1 1 and the action of W4 can be written as W4lxoxl) = ~(IO) + (-Ito 11))( 10) + (_l)Xl 11)) = ~(IOO) + (-It' 101) + (-Ito 110) + (_l)xo+xl 111)) 2 (2.9) for any Xo, Xl E {O, I}. Remark 2.2.3. Note that the state (2.9) is decomposable. This is, of course, natural since the initial state IXoxl) is decomposable, and also W 4 decomposes into a product of unary gates, W 4 = W 2 0 W 2 . On the other hand, Mcnot cannot be expressed as a tensor product of 2 x 2 matrices. This could, of course, be seen by assuming the contrary as in Example 2.2.4, but we can use another argumentation. Consider a two-qubit system in an initial state 100). The action of the Hadamard-Walsh gate on the first qubit transforms this into a state ~(IO) + 11)) 10) = ~(IOO) + 110)). (2.10) But the action of Mcnot turns a decomposable state (2.10) into an entangled state ~(IOO) + 111)). Because M cnot introduces entanglement, it cannot be a tensor product of two unary quantum gates. By a quantum register of the length m we understand an ordered system of m qubits. The state space of such a system is the m-fold tensor product H2 m. = H2 0 ... 0H2 , and the basis states are {Ix) I x E {O, l}m}. Identifying x = Xl ... Xm with a binary representation of a number, we can also say that the basis states of an m-qubit register are {Ia) I a E {O, 1, ... ,2 m - I}}. 2.2 Quantum Information 27 A peculiar feature of the exponential packing density associated with quantum systems can be plainly seen here: a general state of a system of m quantum bits is where Icol2 +IClI2 + ... +ic2=_112 = 1. That is to say that a general description of an m two-state quantum system requires 2m complex numbers. Hence, the description size of the system is exponential in its physical size. The time-evolution of an m-qubit system is determined by unitary mappings in H 2=. The size of the matrix of such a mapping is 2m X 2m , also exponential in the physical size of the system. 4 A more detailed explanation on quantum register processing is provided in Section 2.3.3. 2.2.3 More on Quantum Information So far, we have been talking about qubits, but everything generalizes also to n-ary quantum digits; If we have an alphabet A = {al,"" an}, we can identify the letters with basis states lal), ... , Ian) of an n-Ievel quantum system. We say that such a basis is a quantum representation of alphabet A. The key features of quantum representations are already listed at the end of the introductory chapter, and are: • A set of n elements can be identified with the vectors of an orthonormal basis of an n-dimensional complex vector space Hn. We call Hn a state space. When we have a fixed basis lal), ... , Ian), we call these vectors basis states. Also, the basis that we choose to fix is usually called a computational basis. • A general state of a quantum system is a unit-length vector in the state space. If al lal) + ... + an Ian) is a state, then the system is seen in state ai with a probability of lai. • The state space of a compound system consisting of two subsystems is the tensor product of the subsystem state spaces. • In this formalism, the state transformations are length-preserving linear mappings. It can be shown that these mappings are exactly the unitary mappings in the state space. In the above list we have operations that can be done with quantum systems (under this chosen formalism). We now present a somewhat surprising result called the "No-cloning Theorem" due to W. K. Wootters and W. H. Zurek [72]. Consider a quantum system having n basis states lal), ... , Ian). Let us denote the state space by Hn and specify the state lal) to be 4 It should not be any more surprising that Feynman found the effective deterministic simulation of a quantum system to be difficult. Due to the interference effects, it also seems to be difficult to simulate efficiently a quantum system with a probabilistic computer 28 2. Devices for Computation the "blanck sheet state". A unitary mapping in Hn ® Hn is called a quantum copymachine, if for any state Ix) E H n , U(lx) lal)) = Ix) Ix). Theorem 2.2.1 (No-cloning Theorem). For n > 1 there is no quantum copymachine. Proof. Assume that a quantum copymachine exists, even if n > 1. Because n > 1, there are two orthogonal states lal) and la2). We should have U(lal) lal)) = lal) lal) and U(la2) lal)) = la2) la2), and also 1 U(j2(lal) = 1 "2(lal) lal) + la2)) lal)) = 1 (j2(lal) 1 + la2)))(j2(lal) + la2))) + lal) la2) + la2) lal) + la2) la2)). But since U is linear, 1 1 1 U(j2(lal) + la2)) lal)) = j2U(lal) lal)) + j2U(la2) lal)) 1 1 = j2lal) lal) + j2la2) la2). The above two representations for U(~(lal) + la2)) lal)) do not coincide by the very definition of the tensor product, a contradiction. 0 The No-cloning Theorem thus states that there is no allowed operation (unitary mapping) that would produce a copy of an arbitrary quantum state. Notice also that in the above proof, we did not make any use of unitarity; only the linearity of the time-evolution mapping was needed. If, however, we are satisfied with cloning only the basis states, there is a solution: let I = {I, ... ,n} be the set of indices. Partially defined mapping f : I x I -+ I x I, f(i, 1) = (i, i) is clearly injective, so we can complete the definition (in many ways) such that f becomes a permutation of I x I, and still satisfies f(i, 1) = (i, i). Let f(i,j) = (i',]'). Then the linear mapping defined by U lai) laj) = lai') lap) is a permutation of basis vectors of Hn ® Hn , and any such permutation is unitary, as is easily verified. Moreover, U(lai) lal)) = lai) lai), so U is a copymachine operation on basis vectors. Remark 2.2.4. Notice that, when defining the unitary mapping U(lai) lal)) = lai) lai), we do not need to assume that the second system looks like the first one; the only thing we need is that the second system must have at least as many basis states as the first one. We could, therefore, also define a unitary mapping U lai) Ib l ) = lai) Ib i ), where Ib l ), ... , Ibm) (m 2: n) are the basis states of some other system. What is interesting here is that we could regard the second system as a measurement apparatus designed to observe the state of the first system. Thus, Ib l ) could be interpreted as the "initial pointer 2.2 Quantum Information 29 state", and U as a "measurement interaction". Measurements that can be described in this fashion, are called von Neumann-Luders measurements. The measurement interaction is also discussed in Section 7.4.2. We conclude this section with a discussion concerning the observation procedure. So far, we have silently assumed that quantum systems are used for probabilistic information processing according to the following scheme: • The system is first prepared in an initial basis state. • Next, the unitary information processing is carried out. • Finally, we observe the system to see the outcome. What is missing in the above procedure is that we have not considered the possibility of making an intermediate observation during the unitary processing. In other words, we have not discussed the effect of observation upon the quantum system state. For a more systematical treatment of observation, the reader is advised to study Section 7.3.2. For now, we will present, in a simplified way, the most widely used method to handle state changes during an observation procedure. Suppose that a system in state is observed with Xk as outcome. According to the projection postulate,5 the state of the system after observation is IXk). Suppose, then, that we have a compound system in state n m LLaij IXi) IYj) , (2.11) i=l j=l and the first system is observed with outcome Xk (notice that the probability of observing Xk is P(k) = 2:;:1 lakjI2). The projection postulate now implies that the post-observation state of the whole system is (2.12) In other words, the initial state (2.11) of the system is projected to the subspace that corresponds to the observed state, and renormalized to the unit length. It is now worth heavily emphasizing that the state evolution from (2.11) to (2.12) given by the projection postulate is not consistent with the unitary time evolution, since unitary evolution is always reversible, but there is no way to recover state (2.11) from (2.12). In fact, no explanation for the observation process which is consistent with quantum mechanics (using a certain interpretation) has ever been discovered. The difficulty arising when 5 The projection postulate can be seen as an ad hoc explanation for the state transform during the measurement process. 30 2. Devices for Computation trying to find such an explanation is usually referred to as the measurement paradox of quantum physics. However, instead of having intermediate observations that cause a "collapse of a state vector" from (2.11) to (2.12), we can have an auxiliary system with unitary measurement interaction (basis state copymachine) IXi) IX1) H IXi) IXi). That is, we replace the collapse (2.11) H (2.12) by a measurement interaction, which turns state m n n m i=l j=l (L L aij IXi) IYj) ) IXl) = L L aij IXi) IYj) IXl) i=l j=l (2.13) into n m i=l j=l L L aij IXi) IYj) IXi). (2.14) The transformation from (2.13) to (2.14), despite appearing different from (2.11) H (2.12) at first glance, has many similar features. In fact, using notation P( k) = 2:7=1 lakj 12 again, we can rewrite (2.14) as (2.15) Let us now interpret (2.14): the third system which we introduced shatters the whole system into n orthogonal subspaces, which are, for any k E {I, ... ,n} spanned by vectors IXi) IYj) IXk), i E {I, ... , n}, j E {I, ... , m}. The left multipliers of vectors y'P(k) IXk) are unit-length vectors exactly the same as in (2.12). Moreover, the probability of observing Xk in the right-most system of (2.14) is P(k). Therefore, it should be clear that, if quantum information processing continues independent of the third system, then the final probability distribution is the same if the operation (2.11) H (2.12) is replaced with operation (2.13) H (2.14). Because of this reason, we will not consider intermediate observations in this book, but may refer to them only as notational simplifications of procedure (2.13) H (2.14). 2.2.4 Quantum Turing Machines To provide a model for quantum computation, one can consider a straightforward generalization of a probabilistic Turing machine, replacing the probabilities with transition amplitudes. Thus, there should be a transition amplitude function c5 : Q x A x Q x A x {-I, 0,1} -+ C such that c5(q1' aI, q2, a2, d) gives the amplitude that whenever the machine is in state q1 scanning symbol aI, it will replace al with a2, enter state Q2, and move the head to direction d E {-I, 0, I}. 2.2 Quantum Information 31 Definition 2.2.4. A quantum Turing machine (QTM) over alphabet A is a sixtuple (Q,A,8,qO,qa,qr), where qo, qa, qr E Q are the initial, accepting, and rejecting states. The transition amplitude function must satisfy L 18(ql, aI, q2, a2, d)12 = 1 (q2 ,a2 ,d)EQxAx {-I,O,I} for any (ql, al) E Q x A. Again, to avoid difficulties with computability issues, we require that, for any 8 (ql, aI, q2, a2, d) = x + yi, x and y are rational. Remark 2.2.5. In some cases it would be handy to also accept amplitudes which are not rational. We will later consider this. Notions such as "configuration yields another" are straightforward generalizations of those associated with probabilistic computation. We can also generalize the acceptance models NP, RP, ZPP, and BPP to correspond to quantum computation. For any family F of languages accepted by some classical model of computation (deterministic or probabilistic) we usually write QF to stand for the corresponding class in quantum computation. However, there is already quite a well-established tradition to denote the quantum counterpart of BPP 6 not by QBPP, but by BQP, and the quantum counterpart of P 7, is usually denoted by EQP to stand for Exact acceptation by a Quantum computer in Polynomial time. Also, the quantum counterpart of NP is usually denoted by NQP. As in probabilistic computation, a general configuration of a quantum Turing machine can be seen as a combination (2.16) of basis configurations. Taking the basis configurations as an orthonormal basis of an infinite-dimensional vector space, we see that the general configurations, superpositions, are merely the unit-length vectors of the configuration space. Moreover, the transition amplitude function determines a linear mapping Mii in the state space. From the theory of quantum mechanics (to be precise: from the formalism that we are using) there now arises a further requirement on the transition amplitude function 8: the linear mapping Mii in the configuration space determined by 8 should be unitary. It turns out that the unitarity of the mapping Mii can be determined by the local conditions on the transition amplitude function (see [9]' [34], and [47]), but the conditions are quite technical and we will not represent them here. Instead, we will study quantum circuits as a model for quantum 6 7 The quantum counterpart of BPP is the family of languages accepted by a polynomial-time quantum Turing machine with a correctness probability of at least ~. The family of languages accepted by a polynomial-time quantum Turing machine with a probability of 1. 32 2. Devices for Computation computation in the next chapter. The reason for this choice is that, compared to QTMs, a quantum circuit model is much more straightforward to use for describing quantum algorithms. In fact, all of the quantum algorithms that are studied in chapters 3-5, are given by using quantum circuit formalism. We end this section by examining a couple of properties of quantum Turing machines. The first thing to consider is that the transition of a QTM determines a unitary and, therefore, a reversible time evolution in the configuration space. However, ordinary Turing machines can be irreversible, too. 8 The question is how powerful the reversible computation is. Can we design a reversible Turing machine which performs each particular computational task? A positive answer to this question was first given by Lecerf in [39]' who intended to demonstrate that the Post Correspondence Problem 9 remains undecidable even for injective morphisms. Lecerf's constructions were later extended by Ruohonen in [61]' but Bennett in [6] was the first to give a model for reversible computation simulating the original, possibly irreversible, computation with constant slowdown but possibly with a huge increase in the space consumed (see also [35]). Bennett's work was at least partially motivated by a thermodynamical problem: according to Landauer [38], an irreversible overwriting of a bit causes at least kT In 2 joules of energy dissipation. lO This theoretical lower bound can always be ignored by using reversible computation, which is possible according to the results of Lecerf and Bennett. Bennett's construction of a reversible Turing machine uses a three-tape Turing machine with input tape, history tape and output tape. Reversibility is obtained by simulating the original machine on the input tape, thereby writing down the history of the computation, i.e., the transition rules that have been used so far, onto the history tape. When the machine stops, the output is copied from the input tape to the empty output tape, and the computation is run backwards (also a reversible procedure) to erase the history tape for future use. The amount of space this construction consumes is proportional the computation time of the original computation, but by applying the erasure of the history tape recursively, the space requirement can be reduced even to O(s(n)logt(n)) at the same time using time O(t(n)logt(n)), where sand t are the original space and time consumption [7]. For space/time trade-offs for the reversible computing, see also [40]. 8 9 10 We call an ordinary Turing machine reversible if each configuration admits a unique precessor. Post Correspondence Problem, or PCP for short, was among the first computational problems that were shown to be undecidable. The undecidability of the PCP was established by Emil Post in [54]. The importance of the PCP lies in the simple combinatorial formulation of the problem; the PCP is very useful for establishing other undecidability results and studying the boundary between decidability and undecidability. Here k = 1.380658 . 10- 23 J / K is the Bolzmann's constant and T is the absolute temperature. 2.2 Quantum Information 33 Thus, it is established that whatever is computable by a Turing machine is also computable by a reversible Turing machine, which can be seen as a quantum Turing machine as well. Therefore, QTMs are at least as powerful as ordinary Turing machines. The next question which arises is whether QTMs can do something that the ordinary Turing machines cannot. The answer to this question is no, the reason being the same as for probabilistic Turing machines: we can use an ordinary Turing machine to simulate the computation of a QTM by remembering all the coexisting configurations in a superposition and their amplitudes. Notice that here we use again the assumption that all the transition amplitudes can be written as x+yi, where x and yare rational. This assumption can, of course, be critisized, but the feasibility of a model having arbitrary complex number amplitudes is also questionable. It is however true that, on some operations on quantum systems, non-rational amplitudes, such as ~, would be very natural. Therefore, in practice, we will also use amplitudes x + yi where x and y are not rational, but only if there is an "efficient" way to find rational approximations for x and y. It is left to the reader to verify that in each example in this book, the amplitudes not expressible by rational numbers can be well approximated by amplitudes which can be expressed by using rational numbers. We have good reasons to believe (as did Feynman) that QTMs cannot be efficiently simulated by ordinary ones, not even by probabilistic ones. One major reason to believe this lies Shor's factoring algorithm [63]: there is a polynomial-time, probabilistic quantum algorithm for factoring integers, but any classical polynomial-time algorithm for that purpose has not been found, not even a probabilistic one, despite the huge effort. The next and final issue connected to QTMs in this section is the existence of a universal quantum Turing machine. We say that a Turing machine is universal if it can simulate the computation of any other Turing machine in the following sense: the description (the transition rules) and the input of an arbitrary Turing machine M are given to the universal machine U suitably encoded, and the universal machine U performs the computation of M on the encoded string. We could also say that the universal Turing machine U is programmable so as to perform any Turing Machine computation. It can be shown that a universal Turing machine with 5 states having alphabet size 5 exists; see [57]. In [23], D. Deutsch proved the existence of a universal quantum Turing machine capable of simulating any other QTM with an arbitraty precision, but Deutsch did not consider the efficiency of the simulation. Bernstein and Vazirani supplied in [9] a construction of a universal QTM, which simulates any other Turing machine such that, given the description of a QTM M, a natural number T, and E > 0, the universal machine can simulate M with precision E for T steps in time polynomial in T and ~. 34 2. Devices for Computation 2.3 Circuits 2.3.1 Boolean Circuits Recall that a Turing machine can be regarded as a facility for computing a partially defined function f: A* -t A*. We will now fix A = 1F2 = {O, I} to be the binary alphabet 11 and consider the Boolean circuits computing functions {O, 1} n -t {O, I} m. As the basic elements of these circuits we choose functions tI : IF~ -t 1F2 (the logical and-gate) defined by tI(XI,X2) = 1 if and only if Xl = X2 = 1, V: IF~ -t 1F2 (the logical or-gate) defined as V(XI,X2) = if and only if Xl = X2 = and the logical not-gate IF 2 -t IF 2 defined by ,x = 1 - x. A Boolean circuit is an acyclic, directed graph whose nodes are labelled either with input variables, output variables, or logical gates tI, V, and '. An input variable node has no incoming arrows, while an output variable node has no out coming arrows but exactly one incoming arrow. Nodes tI, V, and , have 2, 2, and 1 incoming arrows respectively. In connection with Boolean circuits, the arrows of the graph are also called wires. The number of the nodes of a circuit is called the complexity of the Boolean circuit. A Boolean circuit with n input variables Xl, ... , Xn and m output variables YI, ... , Ym naturally defines a function IF;' -t 1F2': the input is encoded by giving the input variables values and 1, and each gate computes a primitive function tI, V or '. The value of the function is given in the sequence of the output variables YI, ... , Ym· ° ° ° Example 2.3.1. The Boolean circuit in Figure 2.1 computes the function f : IF~ -t IF~ defined by f(O,O) = (0,0), f(O, 1) = (0,1), f(l,O) = (0,1), and f(l, 1) = (1,0). Thus, f(XI,X2) = (YI,Y2), where YI is Xl +X2 modulo 2, and Y2 is the carry bit. x r-------------~ Fig. 2.1. A Boolean circuit computing the sum of bits Xl and X2 Emil Post has proven a powerful theorem characterizing the complete sets of truth functions [53] which implies that all functions IF;' -t 1F2' can be 11 lF2 stands for the binary field with two elements 0 and 1, the addition and the multiplication defined by 0 + 0 = 1 + 1 = 0, 0 + 1 = 1,0·0 = 0·1 = 0, 1·1 = 1. Thus -1 = 1 and 1-1 = 1 in IF 2. 2.3 Circuits 35 computed by using Boolean circuits with gates /I., V, and -'. A set S of gates is called universal if all functions IF2 ~ IF2 can be constructed by using the gates in S. In what follows, we will demonstrate that S = {/I., V, -,} is a universal set of gates. The fact that all functions IF2 ~ IFr are also computable follows from the observation that one can design any function I : IF2 ~ IFr using m functions which each compute a single bit. First, it is easy to verify that V(XI' V(X2' X3)) equals to 0 if and only if Xl = X2 = X3 = o. For short, we denote v(Xl. V(X2' X3)) = Xl V X2 V X3. Clearly this generalizes to any number of variables: using only function V on two variables, it is possible to construct a function Xl V X2 V ... V X n , which takes value 0 if and only if all the variables are o. Similarly, using only /I. on two variables, one can construct function Xl /I. X2 /I. ... /I. X n , which takes value 1 if and only if all the variables are 1. We define a function Ma for each a = (al, ... ,an) E IF2 by where <Pi(Xi) = -'Xi, if ai = 0 and <Pi(Xi) = Xi, if ai = 1. Thus, functions Ma can be constructed by using only /I. and -'. Moreover, Ma(xl. ... , xn) = 1 if and only if each <Pi(Xi) = 1 for each i, but so <Pi(Xi) = 1 if and only if Xi = ai. It follows that Ma is the characteristic function of singleton {a}: Ma(:Z:) = 1 if and only if:z: = a. Using the characteristic functions Ma it is easy to build any function I: I = Mal V Ma2 V ... V M ak , where al. a2, ... , ak are exactly the elements of IF2 for which I takes a value of 1. Boolean circuits thus provide a facility for computing functions IF2 ~ IFr, where the number of input variables is fixed. Let us then consider a function I: {O, 1}* ~ {O, I} defined on any binary string. Let In be the restriction of Ion {O, l}n. For each n there is a Boolean circuit Cn with n input variables and one output variable computing In, and we say that Co, C I , C 2 , C 3 , ... , is a lamily 01 Boolean circuits computing I. A family of Boolean circuits having only one output node can be also used to recognize languages over the binary alphabet: we regard a binary string with length n as accepted if the output of en is 1, otherwise that string is rejected. So far we have demonstrated that, for an arbitrary function In : IF2 ~ lF2 there is a circuit computing In. Thus, for an arbitrary binary language L there is a family Co, Cl. C 2 , •.• , accepting L. Does this violate Church-Turing's thesis? There are only numerably many Turing machines, so there are only numerably many binary languages that can be accepted by Turing machines. But all binary languages (there are non-numerably many such languages) can be accepted by Boolean circuit families. Is the notion of a circuit family a stronger algorithmic device than a Turing machine? The answer is that the 36 2. Devices for Computation notion of a circuit family is not an algorithmic device at all. The reason is that the way in which we constructed the circuit Cn requires exact knowledge of the values of fn. In other words, without knowing the values of f, we do not know how to construct the circuit family Co, C l , C2, ... ; we can only state that such a family exists but, without the construction, we cannot say that the circuit family Co, C l , C2 , .•. is an algorithmic device. We say that a language L has uniformly polynomial circuits if there exists a Turing machine M that on input 1n (n consecutive 1 's) outputs the graph of circuit C n using space O(logn), and the family Co, C l , C2 , ... accepts L. For the proof of the following theorem, consult [48]. Theorem 2.3.1. A language L has uniformly polynomial circuits if and only if L E P. Since our main concern in this book is polynomial-time quantum computation, we will use quantum circuit formalism which is analogous to the uniformly polynomial circuits. 2.3.2 Reversible Circuits In Section 2.2 we introduced unary and binary quantum gates as unitary mappings. This is a generalization of the notion of reversible gates, so let us now study some properties of reversible gates. A reversible gate on m bits is a permutation on 1FT, so there are m inputs and output bits. Clearly there are (2m)! reversible gates on m bits. Example 2.3.2. Function T : 1Ft ~ 1Ft, T(Xl,X2,X3) = (Xl,X2,XlX2 - X3) defines a reversible gate on three bits. This gate is called the Toffoli gate. The Toffoli gate does not change bits Xl and X2, but computes the notoperation on X3 if and only if Xl = X2 = 1. The symbol in Figure 2.2 is used to signify the Toffoli gate. X2 - - - - i l f - - - - Y2 X3--~r---- Y3 Fig. 2.2. The Toffoli gate A reversible circuit is a permutation on IFg composed of reversible gates. Thus, we make no fundamental distinction between reversible gates and reversible 2.3 Circuits 37 circuits, but the only difference is contextual: we usually assume that there is a fixed set of reversible gates, and an arbitrarily large reversible circuit can be built from them. We call a set R of reversible gates universal if any reversible circuit C : lF -+ lF can be constructed using the gates in R, constants, and some workspace. This means that, using the gates in R, we can construct a permutation f : lF~+m -+ lF~+m such that there is a fixed constant vector (CI,"" cm ) E lF2 such that z z where (YI,"" Yn) = C(XI,"" xn). Moreover, in this construction we also allow bit permutations that swap any two bits: but these bit permutations could, of course, be avoided by requiring only that f simulates C in such a way that the constants are scattered among the variables in fixed places, and that the output can be read on fixed bit positions. If a universal set R consists of a single gate, this gate is said to be universal. When a universal set R of reversible gates is fixed, we again say that the complexity of a circuit C (with respect to the set R) is the number of the gates in C. Earlier we demonstrated that a Boolean circuit constructed of irreversible gates /\, V, and, can compute any function lF -+ lF2' which especially implies that, by using these gates, we can build any reversible circuit, as well. Recalling that any Turing machine can also be simulated by a reversible Turing machine, it is no longer surprising that all Boolean circuits can be simulated by using only reversible gates. In fact: z • Not-gates are already reversible, so we can use them directly. • And-gates are irreversible, and therefore we simulate any such by using a Toffoli gate T(XI,X2,X3) = (XI,X2,XIX2 -X3). Notice that T(XI,X2,0) = (Xl, X2, XIX2), so, using the constant 0 on the third component, a Toffoli gate computes the logical and of variables Xl and X2. • Since Xl V X2 = ,(,Xl /\ ,X2), we can replace all the V-gates in a Boolean circuit by /\- and ,-gates. • The fanout (multiple wires leaving a gate) is simulated by a controlled not-gate C : lF~ -+ lF~, which is a permutation defined by C(XI' X2) = (Xl, Xl - X2), so the second bit is negated if and only if the first bit is one. Again, C(XI,O) = (Xl, xd, so, by using the constant 0 we can duplicate the first bit Xl. Because of the above properties, we can construct a reversible circuit for each Boolean circuit simulating the original one. This contruction uses not-gates, Toffoli gates, controlled not-gates and the constant O. But using the constant 38 2. Devices for Computation 1, the Toffoli gates can also be used to replace the not- and controlled notgates: T(l, 1, Xl) = (1,1, ,Xl) and T(l, Xl, X2) = (1, Xl, Xl -X2), so the Toffoli gates with constants 0 and 1 are sufficient to simulate any Boolean circuit with gates 1\, V and '. Since Boolean circuits can be used to build up an arbitrary function lF2 --t lF2', we have obtained the following theorem. Theorem 2.3.2. A Toffoli gate is a universal reversible gate. Remark 2.3.1. A more straightforward prooffor the above theorem was given by Toffoli in [68]. It is interesting to note that there are universal two-qubit gates for quantum computation, [25] but it can be shown that there are no universal reversible two-bit gates. In fact, there are only 4! = 24 reversible two-bit gates, and all of them are linear, i.e., they can all be expressed as T(x) = Ax + b, where b E lF~ and A is an invertible matrix over the binary field. Thus, any function composed of them is also linear, but there are also nonlinear reversible gates, such as the Toffoli gate. 2.3.3 Quantum Circuits We identify again the bit strings x E lF2' and an orthogonal basis {Ix) I x E lF2'} of an 2m-dimensional Hilbert space H 2 m. To represent linear mappings lF2' --t lF2', we adopt the coordinate representation Ix) = ei = (0, ... , 1, ... ,of, where ei is a column vector having zeroes elsewhere but 1 in the ith position, if the components of x = (Xl,' .. ,xm ) form a binary representation of number i + 1. A reversible gate f on m bits is a permutation of lF2', so any reversible gate also defines a linear mapping in H 2 m. There is a 2m X 2m permutation matrix M(f) 12 representing this mapping, M(f)ij = 1, if f(ej) = ei, and M(f)ij = 0 otherwise. Example 2.3.3. In lFi we denote 1000) = (1,0, ... ,of, 1001) = (0,1, ... ,O)T, ... , 1111) = (0,0, ... , If. The matrix representation of the Toffoli gate is M(T) = where 12 is the 2 x 2-identity matrix and (Example 2.2.1). M~ is the matrix of the not-gate Quantum gates generally are introduced as straightforward generalizations of reversible gates. 12 A permutation matrix is a matrix having entries 0 and 1, exactly one 1 in each row and column. 2.3 Circuits 39 Definition 2.3.1. A quantum gate on m qubits is a unitary mapping in H2 0 ... 0 H2 (m times), which operates on a fixed number (independent of m) of qubits. Because M(f):j = 1 if and only if f(ei) = ej, M(f)* represents the inverse permutation to f. Therefore, a permutation matrix is always unitary, and reversible gates are special cases of quantum gates. The notion of a quantum circuit is obtained from that of a reversible circuit by replacing the reversible gates by quantum gates. The only difference between quantum gates and quantum circuits here is again contextual: we require that a quantum circuit must be composed of quantum gates, which each operate on a bounded number of qubits. Definition 2.3.2. A quantum circuit on m qubits is a unitary mapping on H 2 =, which can be represented as a concatenation of a finite set of quantum quantum gates. Since reversible circuits are also quantum circuits, we have already discovered the fact that whatever is computable by a Boolean circuit is also computable by a quantum circuit. It is also interesting to compare the computational power of polynomial-size quantum circuits (that is, quantum circuits containing polynomially many quantum gates) and polynomial-time QTMs. A. Yao [73J has shown that their computational powers coincide. 3. Fast Factorization In this chapter we represent Shor's quantum algorithm for factoring integers. Shor's algorithm can be better understood after studying quantum Fourier transforms. The issues related to Fourier transforms and other mathematical details are handled in Chapter 8, but a reader having a solid mathematical knowledge of these concepts is advised to ignore the references to Chapter 8. 3.1 Quantum Fourier Transform 3.1.1 General Framework Let G = {gl, g2, ... , gn} be an abelian group (we will use the additive notations) and {Xl, X2, ... , Xn} the characters of G (see Section 8.2). The functions f : G -+ C form a complex vector space V, addition and scalar multiplication are defined pointwise. If It, 12 E V, then the standard inner product (see Section 8.3) of It and 12 is defined by n (It I h) = L ft(gk)12(gk). k=l Any inner product induces a norm by 11111 = v'<JTl). In Section 8.2 it is demonstrated that the functions Bi = Xi form an orthonormal basis of the vector space, so each I E V can be represented as .in where Ci are complex numbers called the Fourier coefficients of f. The discrete Fourier transform of I E V is another function! E V defined by !(gi) = Ci. Since the functions Bi form an orthonormal basis, we see easily that Ci = (Bi I I), so (3.1) The Fourier transform satisfies Parseval's identity 42 3. Fast Factorization 11111 = II!II, (3.2) which will be important in the sequel. Let H be a finite quantum system capable of representing the elements of G. This means that {Ig) I 9 E G} is an orthonormal basis of some quantum system H. To obtain the matrix representations for linear mappings, we use the coordinate representation Igi) = ei = (0, ... ,1, ... , of (all coordinates 0 except 1 in the ith position; see Section 8.3). The general states of the system are the unit-length linear combinations of the basis states. Thus, a general state (3.3) of H can be seen as a mapping ! : G -+ C, where !(gi) = Ci and II!II = 1, (3.4) and vice versa, each mapping (3.4) defines a state of H. Definition 3.1.1. The quantum Fourier transform (QFT) is the operation n n L !(gi) Igi) L [(gi) Igi) . H i=l (3.5) i=l In other words, QFT is just the ordinary Fourier transform of a function ! : G -+ C determined by the state (3.3) via (3.4). It is clear by the formula (3.1) that the QFT (3.5) is linear, but by Parseval's identity (3.2), QFT is even a unitary operation in H. As seen in Exercise 1, for basis vectors Igi) the operation (3.5) becomes (3.6) and hence it is clear that in basis Igi), the matrix of the QFT is (3.7) What kind of quantum circuit is needed to implement (3.6)7 The problem which arises is that, in a typical situation, n = IGI = dim(H) is large, but usually H is a tensor product of smaller spaces. However, quantum circuit operations were required to be local, not affecting a large number of quantum digits at the same time. In other words, the problem is how we can decompose the QFT matrix (3.7) into a tensor product of small matrices, or into a 3.1 Quantum Fourier Transform 43 product of few matrices that can be expressed as a tensor product of matrices operating only on some small subspaces. To approach this problem, let us assume that G = U EEl V is a direct sum of the subgroups U and V. Let r = lUI and s = lVI, hence IGI = rs. Let also U and V be represented by some quantum systems Hu and Hv with orthonormal bases respectively. Then, the tensor product Hu®Hv represents G in a very natural way: each 9 E G can be uniquely expressed as 9 = U + v, where U E U and v E V, so we represent 9 = U + v by Iu) Iv). Since we have a decomposition G= fJ x if, all the characters of G can be written as Xij(g) where and = Xij(U + v) = xf (u)xj (v), xf and xj are characters of U and V respectively (see Section 8.2.1), (i,j) E {l, ... ,r} x {l, ... ,s}. Thus, the Fourier transforms can also be decomposed: Lemma 3.1.1 (Fourier transform decomposition). Let G = U EEl V be a direct product of subgroups U and V and {Iu) Iv) I U E U, v E V} a quantum representation of the elements of G. Then lUi) IVj) ~ (~ t(Xf(Uk»* IUk») ()s t(xj(vz»* Ivz») k=l 1 = r (3.8) Z=l s ~ LLX:j(Uk +vz) IUk) Ivz) yrs (3.9) k=lZ=l is the Fourier transform on G. The decomposition of the Fourier transform may be applied recursively to groups U and V. It is also interesting to notice that the state (3.9) is decomposable. 3.1.2 Hadamard-Walsh Transform Let us study a special case G = are of IFr IFr. IFr. All the characters of the additive group where y E To implement the corresponding QFT we have to find a quantum circuit that performs the operation 44 3. Fast Factorization Ix) f--+ L (-I)"'·Y Iy) . yEIF2' The elements x = (Xl, X2, sentation by m qubits: ... , Xm) E 1FT have a very natural quantum repre- which corresponds exactly to the algebraic decomposition Thus, it follows from Lemma 3.1.1 that it suffices to find QFT on 1F 2, and the m-fold tensor product of that mapping will perform QFT on 1FT. However, QFT on 1F2 with representation {IO), II)} is defined by 1 + 11)), 10) f--+ J2(10) 11) f--+ J2(10) -11)), 1 which can be implemented by the Hadamard matrix (which is also denoted by W 2 ) H 1 (11 -11) . = J2 Thus, we have obtained the following result: Lemma 3.1.2. Let Hm = H®·· ·®H (m times). Then for Ix) = IXI)·· ·Ix m ) (3.10) For any m, the matrix Hm is also called a Hadamard matrix. QFT (3.10) is also called a Hadamard transform, Walsh transform, or Hadamard- Walsh transform. 3.1.3 Quantum Fourier Transform in Zn All the characters of group Zn are 27rI,xy Xy(x) = e n , where x and yare some representatives of cosets (See Sections 8.1.3, 8.1.4, and 8.2). To simplify the notations, we will denote any coset k+nZ by number the k. Using these notations, 3.1 Quantum Fourier Transform 45 Zn = {0,1,2, ... ,n-l}, where addition is computed modulo n. This notation will be used hereafter. The corresponding QFT on Zn is the operation Ix) --+ L e- 2~~xy Iy) . n-l (3.11) y=o Now we have the problem that, unlike for lF2', there is no evident way to give a quantum representation for the elements of Zn in such a way that the basis 10),11),12), ... , In -1) could be represented as a tensor product of two smaller bases representing some subsystems of Zn. If, however, we know some factorization n = nln2 such that gcd(nl, n2) = 1, the Chinese Remainder Theorem (Theorem 8.1.2) offers us a decomposition: according to the Chinese Remainder Theorem, there is a bijection given by F((k l , k 2 )) = aln2kl + a2nlk2, where al (resp. a2) is the multiplicative inverse of n2 (resp. nl) modulo nl (resp. n2)' By Exercise 2, the mapping F is, in fact, an isomorphism. We also notice that, since al (resp. a2) has multiplicative inverse, the mapping kl f--7 a1k 1 (resp. k2 f--7 a2k2) is a permutation of Zn! (resp. Zn2)' and this is all we need for decomposing the QFT (3.11). Assume that the QFTs are available for Zn! and Zn2' i.e., we have quantum representations {IO), 11), ... , Inl - I)} and {IO), 11), ... , In2 -I)} of Zn! and Zn2 and we also have the routines for mappings We use the quantum representation Ik) = Ik l ) Ik 2) for the elements of Zn given by the Chinese Remainder Theorem with k = aln2kl + a2nlk2. By constructing a quantum circuit that performs the multiplications (k 1 , k2 ) f--7 (alk l , a2k2) and concatenating that with the QFT circuits on both components, we get 46 3. Fast Factorization 1 .jnl n 2 1 .jnln2 1 L L n1-1 n2- 1 e_2"i(a1n21~~~~a2n112k2) Ill) Il2) 11 =0 12 =0 L L n1-1 n2- 1 e_2.-iF(~;:'1/2k2) Ill) Il2) 11 =0 12 =0 n1-1 n2-1 """""' """""' ~ ~ ~ e- 2.-iF(k 1 .k2)F(11 n1 n 2 ynl n 2 11 =0 12 =0 h) IF(h, l2)). This decomposition can be applied recursively to nI and n2. Notice carefully that the previous decomposition requires some factorization n = n1n2 such that gcd(nl' n2) = 1, but generally the problem is to find a nontrivial factorization of n. Shor's factorization algorithm makes use of the inverse QFT on 2 2 =, and since there is no coprime factorization for 2m , we have to be ready to offer something else. Fortunately, we can also obtain some more knowledge on QTFs on groups 2 2 =. Since the inverse Fourier transform in 2n is quite symmetric to (3.11), (e- 2"~xy is replaced with e 2"~XY), choosing which one to study is a matter of personal preference. In what follows, we will learn how to implement the inverse Fourier transform on 2 2 =. Group 2 2 = has a very natural representation by using m qubits: an element x = Xm_12m-1 + Xm_22m-2 + ... + X12 + xo, where each Xi E {O, I}, is represented as How do to implement the inverse QFT 2"'_1 Ix) H ~ ~ e 2;:,f YIy) (3.12) with a quantum circuit? To approach this problem, we can first make an observation which should no longer be a surprise more after the previous decomposition results: The superposition on the right-hand side of (3.12) is decomposable. Lemma 3.1.3. L 2=-1 e 2;;,f Y Iy) y=o = (10) + e* 11))(10) + eW 11))··· (10) + e2;;.i.:! 11)) (3.13) 3.1 Quantum Fourier Transform 47 Proof. Representing Iy) = Iy'b) = Iy') Ib), where y' are the m - 1 most significant bits and b the least significant bit of y, we can divide the sum into two parts: y=o = y'=o y'=o 2=-1_1 2=-1_1 L e 2"2\~fY Iy) 10) + L e 2"2i;''''Ve~ Iy) 11) y=o y=o 2=-1_1 = L e;;;::Y Iy) (10) +e 2 ;;.i: t 11)). y=o o The claim now follows by induction. Decomposition (3.13) also gives us a clue on how to compute transform (3.12) by using a quantum circuit. The lth phase of 11) in (3.13) depends only on bits xo, Xl, ... , Xl-1 and can be written as 7ri(2 exp ( m- 1x m _1 _ (7ri(21-1Xl_1 - exp + 2m - 2x m _2 + ... + 2X1 + xo)) 21- 1 + 21-2xl_2 + ... + 2X1 + xo)) 21- 1 _ (7riXI_1) (7riXI-2) - exp ~ exp -2-1- .. (7riX1) ( 7rixO) ·exp 21- 2 exp 21- 1 7rixO) -_ (I)XI-1 exp (7riXI_2) - - - ... exp (7riX1) - - exp ( 21 2/- 2 21- 1 We can now describe a quantum circuit that performs operation (3.12): given a quantum representation (3.14) we swap the qubits to get the reverse binary representation (3.15) In practice, this swapping is not necessary, since we could operate as well with the inverse binary representations; we included the swapping here just for the sake of completeness. On state (3.15) we proceed from right to left as follows: Hadamard transform on the mth (the right-most) qubit gives and we complete the phase (_I)x=-1 to 48 3. Fast Factorization by using phase rotations wi wi exp( 2 1 )' ... ,exp( 2m - wi 2 )' exp( 2m - 1) conditionally. That is, for each l E {I, 2, ... ,m -I}, a phase factor exp( 2':~1) is introduced to the mth bit if and only if mth and lth qubits are both 1. This procedure will yield state The same procedure will be applied from right to left to qubits at locations m - 1, ... , 2, and 1: for a qubit at location l we first perform a Hadamard transform to get the qubit in state then, for each k in range l - 1, l - 2, ... , 1, we introduce phase factor wi exp (2 1_ k ) conditionally if and only if the lth and kth qubit are both 1. That can be achieved by mapping 10) 10) r-+ 10) 10) 10) 11) r-+ 10) 11) 11) 10) r-+ 11) 10) 11) 11) r-+ e~ 11) 11), which acts on the lth and kth qubits. The matrix of the mapping can be written as cPk,1 = ( 100 010 001 0 ) 0 ~i • (3.16) OOOe2I= Ignoring the swapping (X m -1,X m -2, ... ,Xo) r-+ (xo,x1, ... ,xm -d, this procedure results in the network in Figure 3.1 with ~m(m + 1) gates. In Figure 3.1, the subindices of the gates cP are omitted for typographical reasons. The cP-gates are (from left to right) cPm-1,m-2, cPm-1,m-3, cPm-1,O, cPm-2,m-3, cPm-2,O, and cPm-3,O. 3.1 Quantum Fourier Transform Xm-l 49 ~ X m -2 Yl X m -3 Y2 Xo Ym-l Fig. 3.1. Quantum network for QFT on Z2ffl 3.1.4 Complexity Remarks By the traditional discrete Fourier transform one usually understands the Fourier transform in Zn given by 1(x) = n-l L e- 2"!XY f(y)· (3.17) y=O For practical reasons, n is typically chosen as n = 2m . Fourier transform (3.17) can be used to approximate the continuous Fourier transform, and has therefore tremendous importance in physics and engineering. By computing (3.17), we understand that we are given vector (1(0), f(1), ... , f(2 m - 1)) (3.18) as the input data, and the required output is (1(0),1(1), ... ,1(2m - 1)). (3.19) The naive method for computing the Fourier transform is to use formula (3.17) straightforwardly to find all elements in (3.19), and the time complexity given by this method is O((2m)2), as one can easily see. A significant improvement is obtained by using fast Fourier transform (FFT), whose core can be expressed in a decomposition 50 3. Fast Factorization which very closely resembles the decomposition of Lemma 3.1.3, and essentially states that the vector (3.19) can be computed by combining two Fourier transforms in Z2",-1. The time complexity obtained by recursively applying the above decomposition is O(m2m), significantly better than the naive method. The problem of computing QFT is quite different: in a very typical situation we have, instead of (3.18), a quantum superposition (3.20) and QFT operates on the coefficients of (3.20). At the same time, the physical representation size of (3.20) is small; in system (3.20) there are only m qubits, yet there are 2m coefficients Ci. Earlier we learned that QFT in Z2'" can be done in time O(m 2 ) (Hadamard-Walsh transform can be done even in time o (m)), which is exponentially separate from the classical counterparts of Fourier transform. But the major difference is that the physical representation sizes of (3.18) and (3.20) are also exponentially separate, the first one taking !t(2m) bits and the latter one m qubits. Later, we will learn that the key idea behind many interesting fast quantum algorithms is the use of quantum parallelism to convey some information of interest into the coefficients of (3.20) and then compute QFT rapidly. 3.2 Shor's Algorithm for Factoring Numbers 3.2.1 From Periods to Factoring Given two prime numbers p and q, it is an easy task to compute the product n = pq. The naive algorithm already has quadratic time complexity O(max{JpJ ,JqJ}2), but, by using more sophisticated methods, even performance O(max{JpJ ,JqJ}l+f) is reachable for any f > 0 (see [20]). On the other hand, the inverse problem, factorization, seems to be extremely hard to solve. Nowadays it is a widespread belief that there is no efficient classical algorithm for solving the factorization problem: given n = pq, a product of two primes, find out p and q. This problem has great importance in cryptography: the reliability of the famous public-key cryptosystem RSA is based on the assumption that no efficient algorithm exists for factorization. In 1994, Peter W. Shor [63] introduced a probabilistic polynomial-time quantum algorithm for factorization, which will be presented in this chapter. First we show how to reduce the factoring to finding orders of elements in Zn. Let (3.21) be the (unknown) prime factorization of an odd number (the powers of 2 can be easily recognized and canceled). We will also assume that k ;::: 2, i.e., n 3.2 Shor's Algorithm for Factoring Numbers 51 has at least two different prime factors. In fact, there is a polynomial-time algorithm for recognizing powers (see Exercise 3), so we may as well assume that n is not a prime power. We can be satisfied if we can find any non-trivial factor of n rapidly, since the process can be recursively applied to all of the factors in order to uncover the whole factorization (3.21) swiftly. Let us randomly choose, with uniform probability, an element a E :In, a i=- 1. If d = gcd(a, n) > 1, then d is a nontrivial factor of n and our task is complete. If d = 1, assume that we could somehow extract r = ordn(a) (ordn(a) means the order of element a in :In, see Sections 8.1.3 and 8.1.4 for the number-theoretic notions in this section). Then ar == 1 (mod n), which means that n divides a r - 1. Of course this does not yet offer us a method for extracting a nontrivial factor of n, but if r is even, we can easily factorize a r - 1: ar - 1 = (a~ - l)(a~ + 1). Now, since n divides a r -1, n must share a factor with a~ -lor with a~ + 1 (or with both), and this factor can be extracted by using Euclid's algorithm. How could we guarantee that the factor of n obtained in this way is nontrivial? Namely, if n divides a~ ± 1, then Euclid's algorithm would only give n as the greatest common divisor. Fortunately, it is not so likely that n divides a~ ± 1, as we will soon demostrate. First, n I (a~ - 1) would imply that a¥ == 1 (mod n), which would be absurd by the very definition of order. But it may still happen that n just divides d + 1, and does not share any factor with a~ -1. In this case, the factor given by Euclid's algorithm would also be n. On the other hand, n I a ~ + 1 means that a¥ == -1 (mod n), i.e., a~ is a nontrivial square root of 1 modulo n. In the next section we will use elementary number theory to show that, for a randomly chosen (with uniform distribution) element a E :l~, the probability that r = ord(a) is even, and that a ~ ¢. -1 (mod n) is at least Consequently, assuming that r = ord(a) could be rapidly extracted, we could, with a reasonable probability, find a nontrivial factor of n. There may be still some concern about the magnitude of numbers a ~ ± 1. Is it possible that these numbers would be so large that the procedure could be inefficient anyway? Fortunately, it is easy to answer this question: divisibility by n is a periodic property with period n; so, once r = ord(a) is found, it ;6. 52 3. Fast Factorization suffices to find a ~ ± 1 modulo n (for modular exponentiation, rapid algorithms are known). We may now focus on the problem of finding r = ord(a). Since n is odd but not a prime power, group Z~ is never cyclic, but each element a E Z~ has an order r = ord(a) smaller than ~'P(n) < ~. By the definition of the order, for any integer s, which implies that the function = ak f(k) (mod n) f : Z ---+ Zn defined by (3.22) has r as the period. We will demonstrate in Section 3.2.3 that it is possible to find this period rapidly by using QFT. Example 3.2.1. 15 = 3·5 is the smallest number that can be factorized by the above method. Zi5 = {1,2,4, 7,8,11, 13, 14}, and the orders of the elements are 0, 4, 2, 4, 4, 2, 4, 2 respectively. The only element for which d == -1 (mod 15) is a = 14 == -1 (mod 15). If, for example, a = 7, then 15 I 74 - 1 and 7 2 - 1 == 3 (mod 15), 72 + 1 == 5 (mod 15). We get 3 = gcd(3, 15) and 5 = gcd(5, 15) as nontrivial factors of 15. 3.2.2 Orders of the Elements in Zn Let n = p~' ... p~k be the prime factorization of an odd n, and a E Z~ a randomly (and uniformly) chosen element. The Chinese Remainder Theorem gives us a useful decomposition Z*n = Z*" x ... x Z*Pk'k P, (3.23) of Z~ as a direct product of cyclic groups. Recall that the cardinality of each I factor is given by IZp~i = 'P(p~') = p~i-l(pi - 1) (see Section 8.1.4). Note especially that each Zp~i has an even cardinality. For the decomposition of an element a E Z~ we will use notation (3.24) and the order of an element a in Z;, will be denoted by ord p ' (a). Because of the decomposition (3.24), to choose a random element a E Z~ is to choose k random elements as in (3.24) and vice versa. The following technical lemma will be quite useful for the probability estimations. 3.2 Shor's Algorithm for Factoring Numbers 53 Lemma 3.2.1. Let c,o(pe) = 2uv, where u 2: 1, 2 f v and s 2: 0 a fixed integer. Then the probability that a randomly (and uniformly) chosen element a E has an order of form 2S t with 2 f t is at most!. Z;. Proof. If s > u, then the probability in question is 0, so we may assume s ~ u. Let g be a generator of Then = {gO,gl, ... ,g2 v-I}, and Z;.. Z;. U . 2"v ord(gJ) = gcd(j, 2uv) , so an order of form 2S t occurs if and only if j = 2u - s w, where 2 f w. In the set {O, 1,2, ... ,2uv -1} there are exactly 2Sv multiples of 2u- s , namely o· 2u- s , 1· 2u- s , ... , and (2 Sv -1)2"-S, but only half of them have an odd multiplier. Therefore, the required probability is S = __ 12 s <_1 2U v 2 2u - 2' ! _ ·2_ v _2 o as claimed. We can use the previous lemma to estimate the probability of having an odd order. Lemma 3.2.2. The probability that r = ord( a) is odd for a uniformly chosen a E Z~ is at most f,.. Proof. Let (al, ... , ak) be the decomposition (3.24) of an element a E Z~. Let ri = ordp :' (ai). Since r = lcm{rl" .. ,rk} (Exercise 4), r is odd if and only if each ri is odd. Putting s = 0 in Lemma 3.2.1, we learn that the probability of having a Therefore, the probability PI random ai E Zp;' with odd order is at most of having odd r is at most !. o as claimed. What about the probability that a~ == -1 (mod n)? It turns out that Lemma 3.2.1 is useful for estimating that probability, too. Lemma 3.2.3. Let n = p~l ... p~k be the prime decomposition of an odd n and k 2: 2. If r = ordn (a) is even, then the probability that a ~ == -1 (mod n) is at most f,.. Proof. Congruence a;: a~ == -1 (mod n) implies that == -1 (mod p:i) (3.25) 54 3. Fast Factorization for each i. Let ri = ordp~i (ai), so r = lcm{r1"'" rd. We write r = 2S t and ri = 2S 't i , where 2 f t and 2 f t i . From the fact that ri I r for each i, it follows that Si ~ S, but the congruencies (3.25) can only hold if Si = S for every i. For if Si < S for some i, then also ri I ~ (the assumption k 2: 2 is needed here!), which implies that a~ == 1 (mod p~i). (3.26) But (3.26) together with (3.25) gives 1 == -1 (mod p~i), which is absurd since Pi -=I- 2. Therefore, the probability P2 that a¥ == -1 (mod n) is at most the probability that Si = S for each i, which is P2 ~ 1 1 2'" 2 = 1 2k o by Lemma 3.2.1. Combining Lemmata 3.2.2 and 3.2.3, we get the following lemma. Lemma 3.2.4. Let n = p~l ... p~k be the prime factorization of an odd n with k 2: 2. Then, for a random a E Z~ (chosen uniformly), the probability that r = ordn(a) is even and a~ t= -1 (mod n) is at least (1 - :j.)2 2: 196' 3.2.3 Finding the Period By Lemma 3.2.4 we know that, for a randomly (and uniformly) chosen a E Z~, the probability that r = ordn (a) is even and a ~ -1 is at least 196 for any odd n having at least two distinct prime factors. It was already discussed in Section 3.2.1 that the knowledge about the order of such an element a allows us to find efficiently a nontrivial factor of n. Thus, a procedure for computing the order would provide an efficient probabilistic method for finding the factors. Therefore, we have a good reason to believe that finding r = ord n (a) is computationally a very difficult task. It is clear that finding r = ordn (a) reduces to finding the period of function f : Z -t Zn defined by t= f(k) = a k (mod n). (3.27) In fact, if the period of f is r, then aT = aD == 1 (mod n). It is a well-known fact that Fourier transform can be used for extracting information about the periodicity (see Section 8.2.5), and we will show that, by using QFT, the period of (3.27) can be found with non-negligible probability. Of course, we cannot compute the QFT of (3.27) on whole Z, so we choose a domain Zm = {O, I, ... , m - I} with m > n large enough to give room for the period to appear. We will also choose m = 21 for two reasons: the first is 3.2 Shor's Algorithm for Factoring Numbers 55 that then there is a natural quantum representation of elements of Zm by l qubits; and the second is that we have already learned in Section 3.1.3 how to compute the QFT in Z21. We will fix this l later. The procedure requires quantum representation of Zm and Zn (the latter can be replaced by some set including Zn). The first takes l qubits and the other takes at most l qubits. For the representation, notation Ix) Iy), where x E Zm, Y E Zn, will be used. Quantum Algorithm for Finding the Orders 1. The first stage is to prepare a superposition Jm L Ik) 10) m-l (3.28) k=O by beginning with 10) 10) and using Hadamard-Walsh transform in Zm. 2. The second step is to compute k H a k (mod n) to get (3.29) Since the function k as H a k (mod n) has period T, (3.29) can be written (3.30) where SI is the greatest integer for which SIT + l < m. It is clear that cannot vary very much: we always have !!! - 1 - i < SI < !!! - i. T r r r 3. Compute the inverse QFT on Zm to get 1 ~ ~ 1 ~l rm rm -~~-~e =~ 1=0 q=O LLe r-l m-l 1=0 p=O 2wip(qr+l) '" SI I I) Ip) a p=O L e 2W~!rq Ip) la s 2w';'pl l) . (3.31 ) q=O 4. Observe the quantum representation of Zm in superposition (3.31) to get some p E Zm. 5. Find the convergents f. of the continuous fraction expansion (see Section 8.4.2) of;; using Euclid's algorithm and output the smallest qi such that a qi == 1 (mod n), if such qi exists. In the next section we will estimate the correctness probability of this method. Example 3.2.2. Let n = 15 and a found. Let us choose m = 16. = 7 be the element whose period is to be 56 3. Fast Factorization 1. The first step is to prepare 15 ~L Ik) 10). k=O 2. Computation of k f--t 7 k (mod 15) gives ~(IO) 11) + 11) 17) + 12) 14) + 13) 113) + 14) 11) + ... + 115) 113)) = ~ ( ( 10) + 14) + 18) + 112) ) 11) +( 11) + 15) + 19) + 113) ) 17) +( 12) + 16) + 110) + 114) ) 14) +( 13) + 17) + 111) + 115) ) 113) ). 3. The inverse QFT in Z16 yields ~ (( 10) + 14) + 18) + 112) ) 11) +( 10) + i 14) -18) - i 112) ) 17) +( 10) -14) + 18) -112) ) 14) +( 10) - i 14) -18) + i 112) ) 113) ). 4. The probabilities to observe elements 0, 4, 8, and 12 are ~ for each. 5. The only convergent of 1~ is just which does not give the period. The convergents of 1~ are ¥ and ~, and the latter one gives the correct period 4. Observation of 8 does not give the period either, since the convergents of 186 are and but 12 gives, the convergents of ~~ are 1, and ~. ¥, !, ¥ ¥, 3.3 The Correctness Probability 3.3.1 The Easy Case The probability of seeing a particular p E Zm when observing (3.31) is p (P) = L r-1 1=0 - 1 m2 ~ e '" Le Sl q=O 2"ipqr '" 12 L 1L e 1 r=- m2 1 Sl 2"ipqr '" 12 (3.32) 1=0 q=O We will show that, with a non-negligible probability, the value p received observing (3.31) will give us the period r. To get some insight into how to evaluate the probability (3.32), we will first assume that r happens to divide m. Recalling that SI is the greatest 3.3 The Correctness Probability integer such that Sir + 1 < m, and that 0 ~ 1 < each l. In this case, (3.32) becomes pep) = r m2 r, we see that Sl = ~ 57 -1 for 2 m/r-l e tn/r '""'~ ~ (3.33) q=O The sum in (3.33) runs over all of the characters of 7I.. m/ r evaluated at p. Thus (see Section 8.2.1), m/r-l L e~ 'ffi/r q=O - - { !!:!, r 0 . . If P = 0 m 7I..m/r otherwise, and therefore P( ) = { ~, if p = ~ . ~ for some d p 0, otherwIse. Thus, in the case rim, observation of (3.31) can give only some p in the set {O· ~, 1 . ~, ... , (r - 1) . ~}, any such with a probability of ~. Now that we know m and have learned p = d· ~ by observing (3.31), we may try to find r by canceling;;' = ~ into an irreducible fraction using Euclid's algorithm. Unfortunately, this works for certain only if gcd(d, r) = 1; in case gcd(d, r) > 1 we will just get a factor of r as the nominator of ;;. = ~. Fortunately, we can show, by using the number-theoretical results, that the probability of having gcd(d, r) = 1 does not converge to zero too fast. Example 3.3.1. In Example 3.2.2, the period r = 4 divided m = 16, and the only elements that could be observed were 0, 4, 8, 12, the multiples of 4 = ~6. However, 0 and 8 did not give the period, since the multipliers 0 and 2 share a factor with 4. 3.3.2 The General Case Remembering that m = 21, it is clear that we cannot always be lucky enough to have a period r dividing m. We can, however, ask what is the probability of observing a p which is close to a multiple of ~. For if is small enough and gcd(d, r) = 1, then ~ is a convergent of ;;., and all the convergents of ;;. can be found efficiently by using Euclid's algorithm. (see Section 8.4.2). We will now fix m in such a way that the continued fraction method will apply. For any integer d E {O, 1, ... , r - I}, there is always a unique integer p such that the inequality 58 3. Fast Factorization 1 m 1 -- <p-d- <2 r - 2 (3.34) holds. Now choosing m such that n 2 ~ m < 2n2 guarantees that p i-l<l- < -1 - -d < Im r - 2m - 2n 2 2r2' which implies by Theorem 8.4.3 that if gcd(d, r) = 1, then ~ is a convergent to .E... m But there are only r integers p in Zm that satisfy (3.34), one for each d E {O, 1, ... ,r - I}. What is the probability of observing one? In other words, we should estimate the probability of observing p which satisfies r Ipr-dml ~"2 (3.35) for some d. Lemma 3.3.1. For n 2: 100, the observation of {3.31} will give apE Zm such that Ipr - dml ~ ; with a probability of not less than ~. Proof. Evaluating (3.32) (Exercise 5) we get for any fixed d by the periodicity of sin 2 . Since we now assume that x pr - dm takes values in [-i, iJ, we will estimate ° It is not difficult to show that f(x) is an even function, taking the maximum (Sl + 1)2 at x = and minima in [-i, il at the end points ±i. Therefore, 2 f( ) > sin ~(Sl + 1) x_ .2',T SIn Since SI 2m is the greatest integer such that sir 7r r 7rr -(1 - -) < - ( S I 2 m 2m and 7r r + 1) < -(1 + -), 2 m + 1 < m, we have 3.3 The Correctness Probability f(x) 2: sin2 1L(1 - L) ? 2 7rr m SIn 59 (3.36) 2m By the choice of m 2: n 2 , :;;. is negligible, and we can use estimations sin x ::; x and sin 2 ~(1 +x) 2: 1- (~x)2 (Exercise 6) to get f(x) 2: :2(~f(l- (~:)2), (3.37) which implies that 4 -1 ( 1- (--) 71" r 2) P(p) 2: . 71"2 r 2m (3.38) Factor 71" r 1- (--) 2m 2 (3.39) in (3.38) tends to 1 as :;;. -t 0, and for n 2: 100, (3.39) is already greater than 0.9999. Thus, the probability of observing a fixed p such that -~ < p-dr; ::; ~ is at least ~: for any n 2: 100. But there are exactly r such values p E Zm that -~ < p - dr; ::; ~: namely, the nearest integers to dr; for each d E {O, 1, ... , r - I}. Therefore, the probability of seeing some is at least ~ if n 2: 100. 0 We have learned that an individual p E Zm such that 1 m 1 -- <p-d- <2 r - 2 (3.40) ir' is observed with a probability of at least Therefore, any corresponding d E {O, 1, 2, ... ,r - I} is also obtained with a probability of at least We should finally estimate what the probability is that for such d also gcd( d, r) = 1 holds. The probability that gcd(d, r) = 1 for a given element d E {O, 1, ... ,r-l} is clearly 'f'~) , which can be estimated by using the following result. ir' Theorem 3.3.1 ([59]). For r 2: 3, r 'Y 2.50637 cp ( r ) < e log log r + 1og 1ogr ' where 'Y = 0.5772156649 ... is the Euler's constant. I Lemma 3.3.2. For r 2: 19, the probability that, for a uniformly chosen d E {O,l, ... ,r-l} gcd(d,r) = 1 holds, is at least 41og\ogn' 1 i Euler's constant is defined by I = limn-+oo(1 + ~ + + ... ~ -logn}. In [59J it is even shown that in the above theorem, 2.50637 can be replaced with 2.5 for all numbers r but r = 2 . 3 . 5 . 7 . 11 . 13 . 17· 19 . 23. 60 3. Fast Factorization Proof. It directly follows from Theorem 3.3.1 that for r r 2': 19, < 4 log log r. <p(r) Therefore <p(r) 1 r 4 log log r -- > 1 > ...,.-:-----:-- 4 log log n ' which was claimed. D We combine the following facts to get Lemma (3.3.3): • The probability that, for a randomly (and uniformly) chosen a E :En, the order r = ord n (a) is even and d :f:. -1 (mod n) is at least (Lemma 3.2.4). dr;!' < ~ • The probability that observing (3.31) will give a p such that is at least ~ (Lemma 3.3.1). • The probability that gcd(d,r) = 1 is at least 41og\ogn (Lemma 3.3.2). t6 Ip - I Lemma 3.3.3. The probability that the quantum algorithm finds the order of an element of :En is at least 9 1 160 log log n· Remark 3.3.1. It was already mentioned in Section 3.1.4 that many interesting quantum algorithms are based on conveying some information of interest into the coefficients of quantum superposition and then applying fast QFT. The period finding in Shor's factoring algorithm is also based on this method, as we can see: to use quantum parallelism, it prepared a superposition 1 m-l Vm {; Ik) 10) . The information of interest, namely the period, was moved into the coefficients by computing k H ak (mod n) to get In the above superposition, each L Sl q=O Iqr + l) lal ) has (3.41) 3.3 The Correctness Probability 61 as the left multiplier. But in (3.41), a superposition of basis vectors Ix), x E Zm, already contains information about the period in the coefficients: coefficient of Ix) is nonzero if and only if x = qr+l, so the coefficient sequence of (3.41) also has period r, and this period is to be extracted by using QFT. Notice that the elements a l are distinct for l E {O, 1, ... , r -I}, and hence it is guaranteed that the left multipliers (3.41) do not interfere. Thus, we can say that the computation k f-t a k was the operative step in conveying the periodicity to the superposition coefficients. 3.3.3 The Complexity of Shor's Factorization Algorithm We can summarize the previous sections in the following algorithm: Shor's quantum factorization algorithm Input: An odd number n that has at least two distinct prime factors. Output: A nontrivial factor of n. 1. Choose an arbitrary a E {I, 2, ... n - I} 2. Compute d = gcd(a, n) by using Euclid's algorithm. If d > 1, output d and stop. 3. Compute number r by using the quantum algorithm in Section 3.2.3 (m = 21 is chosen such that n2 ~ n < 2n 2). If ar ¢. 1 (mod n) or r is odd or ai == -1 (mod n), output "failure" and stop. 4. Compute d± = gcd(n, d ± 1) by using Euclid's algorithm. Output numbers d± and stop. Step 2 can be done in time O(f(n)3).2 For step 3 we first check whether a has order less than 19, which can be done in time O(f(n)3). Then we use Hadamard-Walsh transform in Zm, which can be done in time O(f(m)) = O(f(n)). After that, computing ak (mod n) can be done in time O(f(m)f(n)2) = O(f(n)3). QFT in Zm will be done in O(f(m)2) steps, and finally the computation of the convergents can be done in time O(f(n)3). Step 4 can also be done in time O(f(n)3). The overall complexity of the above algorithm is therefore O(f(n)3), but the success probability is only guaranteed to be at least Q( log l~g n) = QCog~(n)). Thus, by running the above algorithm O(log(f(n))) times, we obtain a method that extracts a nontrivial factor of n in time O( f( n )3 log f( n)) with high probability. 2 Recall that the notation i(n) stands for the length of number n; that is, the number of the digits needed to represent n. This notation, of course, depends on a particular number system chosen, but in different systems (excluding the unary system) the length differs only by a multiplicative constant, and this difference can be embedded into the O-notation. 62 3. Fast Factorization 3.4 Excercises 1. Let G be an abelian group. Find the Fourier transform of defined by 1() g = 1 : G --+ C { 1, if g = gi, 0, otherwise. If 1 is viewed as a superposition (see connection between formulae (3.3) and (3.4)), which is the state of H corresponding to 1? 2. Let n = nln2, where gcd(nl, n2) = 1. Let also be the function given by F((k l , k 2)) = aln2kl + a2nlk2, where al (resp. a2) given by the Chinese Remainder Theorem, is the multiplicative inverse of n2 (resp. nd modulo nl (resp. n2)' Show that F is an isomorphism. 3. a) Let nand k be fixed natural numbers. Device a polynomial-time algorithm that decides whether n = xk for some integer x. b)Based on the above algorithm, device a polynomial-time algorithm which tells whether a given natural number n is a nontrivial power of another number. 4. Let n = p1 1 ••• p~k be the prime factorization of n and a E Z~. Let (al"'" ak) be the decomposition of a given by the Chinese Remainder Theorem and Ti = ordp;i (ai). Show that ordn(a) = lcm{TI, ... , Tk}. 2 = 4sin2~. 5. a) Prove that le ix b) Prove that _11 I te~12 q=O sin2 71'pr~+l) sin 2 7!E!: m 6. Use Taylor series to show that sin 2 ~(1 + x) ;::: 1- Gx)2. 4. Finding the Hidden Subgroup In this brief chapter we present a quantum algorithm, which can be seen as a generalization of Shor's algorithm. We can here explain, in a more structural manner, why quantum computers can speed up some computational problems. The so-called Simon's hidden subgroup problem can be stated in a general form as follows [12J: Input: A finite abelian group G and function p : G -+ R, where R is a finite set. Promise: There exists a nontrivial subgroup H and distinct on each coset of H. ~ G such that p is constant Output: A generating set for H. The function p is said to fulfill Simon's promise with respect to subgroup H. If h E H, then elements 9 and 9 + h belong to the same coset of H (see Section 8.1 for details), and since p is constant on the cosets of H, we have that p(g + h) = p(g). We also say that pis H-periodic. In Section 4.2 we will see that some interesting computation problems can be reduced to solving the hidden subgroup problem. In a typical example of this problem, IGI is so large that an exhaustive search for the generators of H would be hopeless, but the representation size 1 of an individual element 9 EGis only 8(log IGI). Moreover, G can be typically described by a small number of generators. Here we will consider the description size of G, which is usually of order (log IGl)k for some fixed k as the size of the input. It is also usually assumed that function p : G -+ R can be computed rapidly (in polynomial time) with respect to log IGI. For better illustration we will present the generalized form of Simon's quantum algorithm on 1F2' for solving the hidden subgroup problem. Note also that in group 1F2' any subgroup is a subspace as well, so instead of asking for a generating set, we could ask for a basis. By using the GaussJordan elimination method one can efficiently find a basis when a generating set is known (see Exercise 4). 1 The elements of G can be represented as binary strings, for instance. 64 4. Finding the Hidden Subgroup 4.1 Generalized Simon's Algorithm In this section we will show how to solve Simon's subgroup problem on G 1F2', the addivite group of m-dimensional vector space over binary field IF 2 {O, I}. = = 4.1.1 Preliminaries The results represented here will also apply if IF 2 is replaced with any finite field IF, so we will, for a short moment, describe them in a more general form. More mathematical background can be found in Section 8.3. Here we temporarily use another notation for the inner product: if x = (Xl, ... , xm) and y = (YI, ... , Ym) are elements of IFm, we denote their inner product by x . y = XIYI + ... + XmYm· An element y is said to be orthogonal to H if y . h = 0 for each h E H. It is easy to verify that elements orthogonal to H form a subgroup (even a subspace) of IFm, which is denoted by H1.. The importance of the following simple lemma will become clear in a moment. Lemma 4.1.1. For any y E IFm, ' " (_l)h. Y ~ hEH = {IHI, ify 0, E H1., (4.1) otherwzse. Proof. This is analogous to proving the orthogonality of characters (Section 8.2.2). If y E H1., then h . y = 0 for each h E H, and the claim is obvious. If y rf. H 1., then there exists an element hI E H such that hI . y i= 0, and therefore S L ( - l t = L (_l)(h+h,).y hEH hEH Y = (_1)h L (_l)h· Y = (_1)h y S, = Y 1· 1• hEH hence S = o. o Let H E IFm be a subspace having dimension d. If X is any generating subset of H, we can, by Exercise (4), efficiently find a basis of H. Any d x m matrix M over IF, whose rows are a basis of H, is called a generator matrix of H. The name "generator matrix" is well justified by Exercise (1). It follows from Exercises (2), (3), (4), and (5) that, once a generator matrix of H is given, we can efficiently compute a generator matrix of H 1.. Moreover, since H = (H 1. ) 1. (verify), it follows that in order to find a generating set for H, it suffices to find a generating set for H 1. . 4.1 Generalized Simon's Algorithm 65 For a subgroup H of lFm, let T be some set that consists of exactly one element from each coset (see Section 8.1.2) of H. Such a set is called a transversal of H in lFm and it is clear that ITI = [lFm : H] = = 2;; . It is also easy to verify that, since lFm = H ffi Hl., the equation IlFml = IHI·IHl.1 must hold. 1lF;' 4.1.2 The Algorithms We will use m qubits to represent the elements of lFr and approximately log2[lFr : H] = log2i;;, = m -log2lHI qubits for representing the values of the function p : IF 2 --t R. Here the description size of G = lFr is only m, and we assume that p is computable in polynomial time with respect to the input size. Using only function p, we can find a set of generators for H. It can be shown [12] that if no other knowledge on p is given, Le., p is given as a blackbox function (see Section 5.1.2), solving this problem using any classical algorithm requires exponential time, even when allowing a bounded error probability. On the other hand, we will now demonstrate that this problem can be solved in polynomial time by using a quantum circuit. The algorithm for finding a basis of H consists of finding a basis Y of H l. , then computing the basis of H. As mentioned earlier, the latter stage can be efficiently performed by using a classical computer. The problem of finding a basis for H 1. would be easier if we knew the dimension of H 1. in advance, as we will see. In that case, the algorithm could be described as follows. Algorithm A: Finding the Basis (Dimension Known) 1. If d = dim H 1. = 0, output 0 and stop. 2. Use Algorithm B below to choose the set Y of d elements of H 1. uniformly. 3. Use the Gauss-Jordan elimination method (Exercise 4) to check whether Y is an independent set. If Y is independent, output Y and stop; otherwise, give "failure" as output. In a moment, we will see that the second step of the above algorithm will give a basis of H1. with a probability of at least ~ (Lemma 4.1.2). Before that, we will describe how to perform that step rapidly by using a quantum circuit. It is worth mentioning that, in the above algorithm, a quantum computer is needed only to produce a set of elements of H l.. Algorithm B: Choosing Elements Uniformly 1. Using the Hadamard-Walsh transform on lFr, prepare 1 tnm y2··· L "'ElF;" Ix) 10). (4.2) 66 4. Finding the Hidden Subgroup 2. Compute p to get 1 v rnm 2.. · L "'EIF2' Ix) Ip(x)) = 1 v rnm L L It + x) Ip(t)). 2.. · tET "'EH 3. Use the Hadamard-Walsh transform on _1_ LL _1_ (4.3) 1FT to get L (-l)(t+"')·Y IY) Ip(t)) ..j2ffi tET "'EH ..j2ffi yEIF2' = 2~ = I~I L L (-l?'Y L (-l)""Y IY) Ip(t)) tET yEIF2' "'EH L L (_l)t. y IY) Ip(t)). tETYEH.L 4. Make an observation to get an element Y E H 1- . In the step 3, the equality between the the last and the second-last formulae follows from Lemma 4.1.1. It is easy to see that the probability of observing a particular Y E H 1- is P( ) = "'"' (l!fl(_1)t. y )2 = T IHI2 = I1FTI IHI2 = _1_ Y ~ 2m 22m IHI l1Fm2 l2 IH 1-1 ' tET so Algorithm B works correctly: elements of H 1- are drawn uniformly. It is also clear that the above quantum algorithm runs in polynomial time. For the second step of Algorithm A, we will now analyze the probability that when choosing d elements uniformly from a d-dimensional subspace, the chosen vectors are indpendent. The analysis is straightforward as seen in the following lemma. Lemma 4.1.2. Let t ::; d and Yl, ... , Yt be randomly chosen vectors, with uniform distribution, in a d-dimensional vector space over 1F 2. Then the probability that Yl, ... , Yt are linearly independent is at least i. Proof. The cardinality of a d-dimensional vector space over IF 2 is 2d. Therefore, the probability that {Yl} is a linearly independent set is 2~d1, since only choosing Y1 = 0 makes {yt} dependent. Suppose now that Yl, ... , Yi have been chosen in such a way that S = {Yl, ... ,Yi-t} is a linearly independent set. Now S generates a subspace of dimension i - 1. Hence, there are 2i - 1 choices for Yi that make S U {y;} linearly dependent. Thus, the probability that the set {Y1," . , yt} is an independent set is 2d _ 20 2d _ 21 P Then = 2d 2d ( 4.4) 4.1 Generalized Simon's Algorithm 1 2d t-l p = II 2d _ i=O t-l 2i = II i=O 2d-i 2 d- i - 2d-t+i t 1 = II i=l 67 1 2 d- t+ i - Now 1 in 2p ::::: 8 t ( 1) 1n 1 + 2i - 1 ::::: 8 t 1 2i - 1 ::::: 8 ~ . 38 t 1 2i < 4 00 1 2i 412 3 2 3' o so 2~ < e~ < 2, hence p > ~. Remark 4.1.1. Notice that if we build another algorithm (call it A') which repeats Algorithm A whenever the output is "failure", then A' cannot give us a wrong answer, but we cannot find a priori upper bound for the number of repeats it has to make. Instead, we can say that on avemge, the number of repeats is at most four. Such an algorithm is called a Las Vegas algorithm, cf. Section 2.1.2. An interesting method for finding the basis in polynomial time with certainty is described in [12]. Remark 4.1.2. Algorithm B for choosing an element in H1. resembles the period-finding algorithm of Section 3.2.3. The first step was to prepare a quantum superposition of all the elements of 1F2'. Then, by utilizing quantum parallelism, function p was computed simultaneously on all elements of 1F2'. This was the operative step to convey information of interest to the superposition coefficients, and this information was extracted by using QFT. Notice that since p is different on distinct cosets, all the vectors Ip( t)) are orthogonal and, therefore, the states L (_l)t. y IY) Ip(t)) yEHJ. with different values of p( t) do not interfere. To conclude this section, we describe how to find the basis of H 1., when d = dim H 1. is not known in advance. Because of Algorithm A, this is done if we can describe an algorithm which can find out the dimension of H 1. . The dimension of H1. can be found by utilizing Algorithm A (which uses Algorithm B as a subprocedure). By Lemma 4.1.2 we see that, if we choose uniformly t ::; d elements from a subspace of dimension d, then the probability that the chosen elements are independent is at least We utilize this as follows: if D = dim H 1., the (unknown) dimension of H , we just guess that the dimension of H 1. is d (0 ::; d ::; m), and run Algorithm A (with the d we t 68 4. Finding the Hidden Subgroup guessed). If d ::; D, we obtain, according to Lemma 4.1.2, a set of d linearly independent vectors with a probability of at least Once we get such a set of vectors, we can tell, with certainty, that the guess d::; D was correct (the number of linearly independent vectors that a vector space can contain is at most the dimension of the space). On the other hand, if d > D, the set of d vectors we obtain can never be linearly independent. After repeating Algorithm A a number of times, we can be convinced that our guess d > D was incorrect. Anyway, Algorithm A can always be used to decide whether d ::; D holds with a correctness probability of at least We will now describe how to find D, the dimension of Hl... To make the description easier, we will regard all other elements (D, H, and p) fixed, but only the interval I where the dimension may be found, is mentioned as an input. For an interval 1= {k, k + 1, ... ,k + r} we use notation M(I) = l~J for the "midpoint" of I, B(I) = {k, k+ 1, ... ,M(I) + I} for the "initial part" and T(I) = {M(I), ... ,k+r} for the "final part" of I. In the following algorithm, D always stands for the true (unknown) dimension of Hl.. and initially, the input of the following algorithm is the interval I = {a, 1, ... ,m}. i. i. Algorithm C: Determining the Dimension 1. If III = 1, output the unique element of I and stop. 2. If III > 1, run Algorithm A to decide if D ::; M(I). If the answer is "yes", apply this algorithm with input B(I); otherwise, apply this algorithm with input T(I). Algorithm C thus finds the dimension D by recursively cutting the interval {a, 1, ... ,m} into two approximately equally long parts, and using Algorithm A to decide which one is the part containing D. It is therefore clear that Algorithm C stops after 8 (log2 m) recursive calls. Since running Algorithm C, we have to make only k = 8(log2 m) decisions based on Algorithm A, which has a correctness probability of at most we have that Algorithm i, C works with a correctness probability of p 2: (it = 8(ilOg2m) = 8(~). Therefore, repeating Algorithm C O(m 2 ) times we get an algorithm which gives the dimension D with a nonvanishing correctness probability. It is left as an exercise to analyze the running time of the algorithms A, B, and C to conclude that the generators of a hidden subgroup can be found in polynomial time provided that p can be computed in polynomial time. Remark 4.1.3. It is an open problem whether the hidden subgroup problem can be solved, by using a quantum computer, in polynomial time for non-abelian groups as well. For progress in this direction, see [28J and the references therein. The hidden subgroup problem for non-abelian groups is of special interest, since the graph isomorphism problem can be reduced to it. 4.2 Examples 69 4.2 Examples We will now demonstrate how the hidden subgroup problem can be applied to computational problems. These examples are due to [45J. 4.2.1 Finding the Order This problem lies at the very heart of Shor's factorization algorithm: Let n be a large integer and a another integer such that gcd(a, n) = 1. Denote r = ordn(a), i.e., r is the least positive integer that satisfies ar == 1 (mod n). In this problem, we have a group Z, and the hidden subgroup is rZ, whose generator r should be found. The function p(x) = aX + nZ satisfies Simon's promise, since ¢=::} p(x) = p(y) aX + nZ = aY + nZ n I (aX - aY ) ¢=::} rl(x-y) ¢=::} x ¢=::} + rZ = y + rZ. Because Z is an infinite group, we cannot directly solve this problem by using the algorithm of the previous section, but using finite groups Z21 instead of Z already gives us approximations that are good enough, as shown in the previous chapter. 4.2.2 Discrete Logarithm If F is a cyclic group of order q, then there is a generator 9 E F such that F = {1 = gO,gl,g2, ... ,gq-1} (see Section 8.1 for notions of group theory). Hence, if we are given a generator 9 of F and another element a E F, then a can be uniquely expressed as a = gr, where r E {a, 1, 2, ... ,q - 1}. The discrete logarithm problem is to find r. Since gq = 1 (and q is the least positive number with this property), we have that grl = gr2 if and only if r1 == r2 (mod q). Hence, instead of regarding the exponent of 9 as an integer, we can regard it as an element of Zq as well. In fact, mapping Zq ~ F, r + qZ H gr is an isomorphism. We can reduce the discrete logarithm to finding the hidden subgroup as follows: If a = gr E F, let G = Zq x Zq. The hidden subgroup H = {(a, ra) I a E Zq} is generated by element (1, r). That function p : Zq x Zq ~ G defined as satisfies Simon's promise, can be verified easily: 70 4. Finding the Hidden Subgroup P(XI, YI) = p(X2, Y2) {==? gXla-Yl {==? gXl -rYl = gX2 a -Y2 = gX2 -rY2 But the last equation is true if and only if Xl -rYI = X2 -rY2 (when regarded as elements of Zq), which, in turn, is true if and only if Xl - X2 = r(YI - Y2). This is equivalent to condition (Xl, YI) + H = (X2, Y2) + H. On the other hand, there may be other generators of H than (1, r), but using a number-theoretic argumentation similar to that used in Section 3.3.2, we can see that the discrete logarithm can be found with a nonvanishing probability in polynomial time. Remark 4.2.1. Reliability of the U.S. Digital Signature Algorithm is based on the assumption that the discrete logarithm problem on finite fields remains intractable on classical computers [46]. 4.2.3 Simon's Original Problem Simon [65] originally formulated his problem as follows: Input: An integer m 2: 1 and a function P : IF2 --+ R, where R is a finite set. Promise: There exists a nonzero element s E IF2 such that for all x, y E IF;' p(x) = p(y) if and only if x = y or x = y + s. Output: Element s. It is plain to see that choosing G as IF2 and H = {O, s} as the subspace generated by s makes this problem a special case of a more general problem presented at the beginning of this chapter. Simon's problem is of historical interest, since it was the first one providing an example of a problem that can be solved, with nonvanishing probability, in polynomial time by using a quantum computer, but which requires exponential time in any classical algorithm if p is considered a blackbox function (see Section 5.1.2). 4.3 Exercises 1. Show that if M is a generator matrix of H, then 2. Show that if M is a generator matrix of Hand N is a d' x m matrix such that M N..L = 0, then the subspace generated by the rows of N is orthogonal to H. 3. The elementary row operations 011 a matrix over field IF are: 4.3 Exercises a) Swapping two rows, Xi H Xj. b) Multiplying a row with a nonzero element of IF, Xi H CXi, C # c) Adding to a row another row multiplied by any element of IF, Xi + CXj. 71 o. Xi H Show that if M' is obtained from M by using the elementary row operations, then the rows of M and M' generate exactly the same subspace. 4. A matrix is in a reduced row-echelon form if: a) For any row containing a nonzero entry, the first such is 1. The first nonzero entry in a row is called the pivot. b) The rows containing only zero entries, if there are such rows, are the last ones. c) The pivot on each row is on the right-hand side of the pivot of the row above. d) Each column containing a pivot of a row has zeros everywhere else. If only conditions a), b), and c) are satisfied, we say that the matrix is in row-echelon form. Prove the following Gauss-Jordan elimination theorem. Each kxm matrix can be transformed into a reduced row-echelon form by using O(k 3 m) field operations. Moreover, the nonzero rows of a row-echelon form matrix are linearly independent. 5. Prove that if M = (1d I M') is a generator (d x (m - d)) matrix of H, then N = (_MT 11m - d ) is a generator matrix of HT. 5. Grover's Search Algorithm 5.1 Search Problems Let us consider a children's game called hiding the key: one player hides the key in the house, and the others try to find it. The one who has hidden the key is permanently advising the others by using phrases like "freezing", "cold", "warm", and "hot", depending on how close the seekers are to the hiding place. Without this advice, the game would obviously last much longer. Or, can you develop a strategy for finding the key without searching through the entire house? 5.1.1 Satisfiability Problem There are many problems in computer science which closely resemble searching for a tiny key in a large house. We will shortly discuss the problem of finding a solution for an NP-complete 3-satisfiability problem: we are given a propositional expression in a conjunctive normal form, each clause being a disjunction of three literals (a Boolean variable or a negation of a Boolean variable). In the original form of the problem, the task is to find a satisfying truth assignment, if any such exists. Let us then imagine an advisor who has enough power to always tell at once whether a given boolean expression has a satisfying truth assignment or not. If such an advisor were provided, finding a satisfying assingnment is no longer a difficult problem: we could just substitute 1 for some variable and ask the advisor whether the outcoming Boolean expression with less variables had a demanded assignment or not. If our choice was incorrect, we would flip the substituted value and proceed on recursively. Unfortunately, the advisor's problem to tell whether a satisfying valuation exists is an NP-complete problem, and in light of our present knowledge, it is very unlikely that there would be a fast solution to this problem in the real world. But let us continue our thinking experiment by assuming that somebody, let us call him a verifier, knows a satisfying assignment for a Boolean expression but is not willing to tell it to us. Quite surprisingly, there exist so-called zero-knowledge protocols (see [62] for more details), which the verifier can use to guarantee us that he really knows a satisfying valuation 74 5. Grover's Search Algorithm without revealing even a single bit of his knowledge. Thus, it is possible that we are quite sure that a satisfying truth assignment exists, yet we do not have a clue what it might be! The obvious strategy to find a satifying assignment is to search through all of the possible assignments. But if there are n Boolean variables in the expression, there are 2n assignments, and because we do not have a supernatural advisor, our task seems quite hopeless for large n, at least in the general case. In fact, no faster method than an exhaustive search is known in the general case. 5.1.2 Probabilistic Search In what follows, we will slightly generalize our search problem. Instead of seeking a satisfying assignment for a Boolean expression, we will generally talk about functions f : JF2 -+ JF 2 , and our search problem is to find a x E JF2 such that f(x) = 1 (if any such x exists). Notice that, with the assumption that f is computable in polynomial time, this model is enough to represent NP problems because it includes the NP-complete 3-satisfiability problem. This model is also assigned [31J to unordered database search, where we have to find a specific item in a huge database. Here the database consists of 2n items, and f is the function giving a value of 1 to a required item and 0 to the rest, hereby telling us whether the item under investigation was the required one. Another simplification, a huge one this time, is to assume that f is a socalled blackbox function, i.e., we do not know how to compute f, but we can query f, and the value is returned to us instantly in one computational step. With that assumption, we will demonstate in the next chapter how to derive a lower bound for the number of queries to f in order to find an item x such that f(x) = 1. In the next chapter we will, in fact, present a general strategy for finding lower bounds for the number queries concerning other goals, too. Now we will study probabilistic search i.e., we will omit the requirement that the search stragegy will always give such a value x that f(x) = 1, but will only do this with a non vanishing probability. This means that the search algorithm will give the required x with at least some constant probability o < p :S 1 for any n. This can be seen as a natural generalization of the search which with certainty returns the required item. Remark 5.1.1. Provided that the probabilistic search strategy is rapid, we use very standard argumentation to say that we can find x rapidly in practice: the probability that, after m attempts we have not found x, is at most (l_p)m :S e- pm , which is smaller than any given E > 0, when m > -~. Thus, we can reduce the error probability p to any other positive constant E just by repeating the original search a constant number of times. 5.1 Search Problems 75 It is easy to see that any classical deterministic search algorithm which always gives an x such that f(x) = 1 (if such x exists) will require N = 2n queries to f: if some algorithm makes less than N queries, we can modify f in such a way that the answer is no longer correct. Remark 5.1.2. In the above argumentation, it is, of course, essential that we only handle blackbox functions. If we are, instead, given an algorithm for computing f, it is very difficult to say how easily this algorithm itself will give information about x to be found. In fact, the question about the computational work required to recover x from the algorithm for f lies at the very heart of the difficult open problem whether P #- NP. We cannot do much better than deterministic search by allowing randomization: fix y E 1F2 and consider a blackbox function fy defined by f () 'II X = { 1, if x = y, 0, otherwise. (5.1) If we draw disjoint elements Xl, ... , Xk E 1F2 with uniform distribution, the probability of finding y is .;" so we would need at least pN queries to find y with a probability of at least p. Using non-uniform distributions will not offer any relief: Lemma 5.1.1. Let N = 2n and f be a blackbox function. Assume that AI is a probabilistic algorithm that makes queries to f and returns an element x E 1F2. If, for any non-constant f the probability that f(x) = 1 is at least p > 0, then there is f such that AI makes at least pN - 1 queries. Proof. Let fy be as in (5.1) and Py(k) be the probability that AI" returns y using k queries. By assumption Py (k) ~ p, and we will demonstrate that there'is some y E 1F2 such that Py(k) :s; First we show by using induction that W. L Py(k) :s; k + l. yEIF 2 If k = 0, then AI1I gives any x E 1F2 with some probability Pm and thus L yEIF 2 Py(O) = L py = l. yEIF2 Assume, then, that L Py(k - 1) :s; k. yEIF 2 On the kth query AI" queries f(y) with some probability qy, and therefore Py(k) :s; Py(k - 1) + qy. Thus, 5. Grover's Search Algorithm 76 Because there are N Py(k):::; = 2n different choices for y, there must exist one with k~ 1. It follows that ~ 2: Py(k) 2: p, so k 2: pN - 1. o 5.1.3 Quantum Search with One Query Let us continue studying blackbox functions f : 1F2' ---+ 1F 2 . The natural question arising now is whether we can devise a faster search method on a quantum computer using quantum parallelism for querying many values of a black box function simultaneously. The very first thing we need to fix is the notion of a quantum blackbox function. The notion we follow here is widely accepted: let x E 1F2'. In order to make a blackbox function query f(x) on a quantum computer, we will utilize a source register Ix) (n qubits) and a target qubit Ib). A query operator Q f is a linear mapping defined by Qf Ix) Ib) = Ix) Ib EB f(x)) (5.2) , where EB means addition modulo 2 or, in other words, exclusive or -operation. We can now easily see that (5.2) defines a unitary mapping. In fact, vector set spans a 2n +1-dimensional Hilbert space. Mapping Qf operates as a permutation on B, and it is a well-known fact that any such mapping is unitary. Moreover, since QfQf Ix) Ib) = Ix) Ib EB f(x) EB f(x)) = Ix) Ib), the inverse of Qf is Qf itself. Let us now fix y E 1F2' and try to devise a quantum search algorithm for function fy defined in (5.1). The very first idea is to generate a state 1 Inn L v 2·" :cEIF;' Ix) 10) (5.3) beginning with 10) 10) and using Hadamard-Walsh transform Hn (cf. Section 3.1.2). After that, one could make a single query Qfy to obtain state (5.4) 5.1 Search Problems 77 But if we now observe the last qubit of state (5.4), we would only see 1 (and after that, observing the first register, get the required y) with a probability of 2~' so we would not gain any advantage over just guessing y. But quantum search can improve the probabilistic search, as we will now demonstrate. Having state (5.3), we could flip the target bit (to 11)), and then apply the Hadamard transform to that bit to get a state 1 rnn y 2·· L "'Elf, 1 1 Ix) ",(1 0) -11)) = v' n+1 Y2 2 L ( Ix) 10) - "'Elf, L Ix) 11)). (5.5) "'Elf, If x =I- y, then Qf" Ix) 10) = Ix) 10) and Qf" Ix) 11) = Ix) 11), but Qf" Iy) 10) = Iy) 11) and Qf" Iy) 11) = Iy) 10); so, querying fy by the query operator Qf" on state (5.5) would give us a state v'2~+1 ( L Ix) 10) v'2~+1 ( L Ix) (10) - 11)) + Iy) (11) - 10))) + Iy) 11) - "'#Y = L Ix) 11) -Iy) 10) ) "'#Y "'#Y = _1_ L (-I)f,,("') Ix) ~(IO) -11)). J2 ffn "'Elf, (5.6) Notice that preparing the target bit in superposition ~(IO) - 11)) before applying the query operator was used to encode the value fy(x) in the sign (-I)f,,("'). This is a very basic technique in quantum computation. In the continuation we will not need the target bit anymore, and instead of (5.6) we will write only (5.7) So far we have seen that applying the query operator only once in a quantum superposition (5.4), we get state (5.7) ignoring the target qubit. We continue as follows: we write (5.7) in form and apply Hadamard Hn transform to get 10) - ;n L (-l)"'·y Ix) "'ElF, = (1- 22n) 0)- 2: L(-I)"'·y Ix). 1 ",#0 (5.8) 5. Grover's Search Algorithm 78 In superposition (5.8), we separate 10) from all other states by designing function fo : 1F2 -+ 1F 2 , which gets a value of 1 for 0 and 0 for all other x E 1F2. Such a function is possible to construct by using O(n) Boolean gates and, through the considerations of section 2.3.3, it is also possible to construct fo by using O(n) quantum gates, using several ancilla qubits. We will store the value of fo in an additional qubit to obtain state 22n) 10)11) - (1 - 2: I) -1)""Y Ix) 10) . (5.9) ",;"0 Now, observation of the last qubit of (5.9) will result in 10) 11) (5.10) with a probability of (1 - 2:)2 = 1 - 2~ + In, and in ~ l)-I)""Y Ix) 10) 2 - 1 ",;"0 = ~( 2 1 L "'E IF n (5.11) (-I)""Y Ix) -1 0 )) 10) 2 In. with a probability of 2~' Applying Hadamard transform Hn once more to states (5.10) and (5.11) will yield (ignoring the last qubit) 1 !fri Liz) (5.12) v",'· "ElF;' and respecti vely. Finally, observing (5.12) and (5.13) will give us y with probabilities 2~ and 2~;;-1 respectively. Keeping in mind the probabilities to see states (5.10) and (5.11), we find that the total probability of observing y is 2:' the best we could get by which is approximately 2.5 times better than a randomized algorithm that queries fy only once (see the proof of Lemma 5.1.1). 5.2 Grover's Amplification Method 79 Soon, we will see that, by using a quantum circuit, it is still possible to improve this probability. The next section is devoted to Lov Grover's ingenious idea of using quantum operations to amplify the probability for finding the required element. 5.2 Grover's Amplification Method We will still use blackbox function 111 : 1F2 --+ 1F2 from the previous section as an example. The task is to find y by querying the 111 , and we will follow the method of Lov Grover to demonstrate that using a quantum circuit, we. can find y with nonvanishing probability by using 0(5) queries. We will also show how to generalize this method for all blackbox functions. 5.2.1 Quantum Operators for Grover's Search Algorithm A query operator Qf'll which is used to call for values of 111 uses n qubits for the source register and 1 for the target. We will also need a quantum operator Rn defined on n qubits and operating as Rn 10) = -10) and Rn Ix) = Ix), if x =F o. If we index the rows and columns of a matrix by elements of 1F2 (ordered as binary numbers), we can easily express Rn as a 2n x 2n matrix; -100 ... 0 010 ... 0 001. .. 0 (5.14) 000 ... 1 But we require that the operations on a quantum circuit should be local, i.e., there should be a fixed upper bound on number of qubits on which each gate operates. Let us, therefore, pay some attention on how to decompose (5.14) into local operations. The decomposition can obviously be made by making use of function 10 : 1F2 --+ 1F2' which is 1 at 0 and 0 everywhere else. As discussed in the earlier section, a quantum circuit Fn for this function can be constructed by using O(n) simple quantum gates (which operate on at most three qubits) and some, say kn, ancilla qubits. To summarize: we can obtain a quantum circuit Fn on n + kn qubits operating as Fn Ix) 10) 10) = Ix) Ilo(x)) la",). Using this, we can proceed as follows: we begin with. Ix) 11) 10), and then operate with H2 on the target bit to get (5.15) 80 5. Grover's Search Algorithm Ix) ~(IO) -11)) 10). After this, we call Fn to get Using first the reverse circuit F;;l and then H2 on the target qubit will give us (_I)fo(oo) Ix) 11) 10) . Composing all of this together, we get an operation Ix) 11) 10) H (_I)fo(oo) Ix) 11) 10), which is the required operation Rn with some ancilla qubits. As usual, we will not write down the ancilla explicitly. The other tool needed for a Grover search is to encode the value f(x) in the sign - but this can be easily achieved by preparing the target bit in a superposition: Qf Ix) ~(lO) -11)) = (-I)f("') Ix) ~(IO) -11)). (5.16) Instead of (5.16), we will introduce notation Vf Ix) = (-I)f("') Ix) for the modified query operator, again omitting the ancilla qubit. 5.2.2 Amplitude Amplification Grover's method for finding an element y E IF2" that satisfies f(y) = 1 ~ we will call such a y a solution hereafter ~ can be viewed as iterative amplitude amplification. The basic element is quantum operator (5.17) working on n qubits, which represent elements x E IF2"' In the above definition, Vf is the modified query operator working as Vf Ix) = (-l)f("') Ix), Hn is the Hadamard transform, and Rn is the operator which reverses the sign of 10), Rn 10) = -10) and Rn Ix) = Ix) for x =1= o. It is interesting to see that HnRnHn can be written in quite a simple form as a 2n x 2n matrix: if we let elements x E IF2" to index the rows and the columns, we find that, since 5.2 Grover's Amplification Method 81 An the element of Hn at location (x,y) is (-l)"'·y. Writing generally A",y n n for the element of a 2 x 2 matrix A at location (x, y), we see (recall the matrix representation of Rn in the previous section) that (HnRnHn)",y = L (HnRn)",z(Hn)zy "'ElF;' z = w 2: L (-l)""Z(Rn)zz( _l)z,y zEIF;' = L 21n (-2 + (_l)("'+Y)'Z) zEIF;' = { - ~, 1- 2n , .if x # y If X = y. Thus HnRnHn has matrix representation which can also be expressed as (5.18) where I is 2n x 2n identity matrix and P is a 2n x 2n projection matrix, whose every entry is 2~' In fact, it is quite an easy task to verify that P represents a projection into a one-dimensional subspace generated by vector 'Ij; = 1 L InTi Ix). y 2·· "'ElF;' Thus, using the notations of Chapter 7, we can write P =1 'Ij;)('Ij; I, but this notation is not crucial here. It is more essential that representation (5.18) gives us an easy method for finding the effect of -HnRnHn on a general superposition (5.19) Writing 82 5. Grover's Search Algorithm (the average of the amplitudes), we find that (5.20) is the decomposition of (5.19) into two orthogonal vectors: the first belongs to the subspace spanned by 'IjJ; the second belongs to the orthogonal complement of that subspace. In fact, it is clear that the first summand of the right-hand side of (5.20) belongs to the subspace generated by 'IjJ, and the orthogonality of the summands is easy to verify: '" Y LA*(c", - A)(x I y) = L "'EiF2' yiF2' =A* L C",- "'EiF2' L A*A "'EiF2' =A*2nA-2nA*A=0. Therefore, and = 2A = L L "'EiF2' Ix) - L c'" Ix) "'EiF2' (2A - c"') Ix) . (5.21 ) "'EiF2' Expression (5.21) explains why operation -HnRnHn is also called inversion about average: it operates on a single amplitude c'" by multiplying it by -1 and adding two times the average. We use the following example to explain how G n = -HnRnHn Vf is used to amplify the desired amplitude in order to find the solution (recall that the solution is an element y E 1F2 which satisfies f(y) = 1). In this example, we consider a function f5 : 1F2 -+ 1F 2 , which takes value 1 at y = (0, ... ,0,1,0,1), and is 0 everywhere else. The search begins with superposition 1 '2n" L v.c.·· "'EiF2' Ix) 5.2 Grover's Amplification Method 83 having uniform amplitudes. Figure 5.1 depicts this superposition, with Co = Cl = ... C2 n = The modified query operator Vis flips the signs of all An. 1 ~ -~I--~--~~--~~--~--~----~ Co Cl C2 C3 C4 Cs C6 C7 C2n_l Fig. 5.1. Amplitudes of the initial configuration. those amplitudes that are coefficients of a vector Ix) satisfying 15(x) = 1. In this example, y = (0, ... ,0,1,0,1) is the only one having this property, and C5 becomes This is depicted in Figure 5.2. An. 1 ~ -~I--~~--~--~-r--~~----~ Fig. 5.2. Amplitudes of after one query. In the case of Figure 5.2, the average of the amplitudes is An. which is still very close to Thus, the inversion about average-operator -HnRnHn will perform a transformation 5~2A-5~$ { -$ ~ 2A+ An ~3· An, which is illustrated in Figure 5.3. It is thus possible to find the required 1 Vi2n Fig. 5.3. Amplitudes after -HnRnHn V,. y = (0, ... ,0,1,0,1) with a probability ~ 2~ by just a single query. This is approximately 4.5 times better than a classical randomized search can do. 84 5. Grover's Search Algorithm 5.2.3 Analysis of Amplification Method In this section, we will find out the effect of iterative use of mapping G n = -HnRnHn VI, but instead of a blackbox function f : lF2' -t lF2 that assumes only one solution, we will study a general f having k solutions (recall that here a solution means a vector x E lF2' such that f(x) = 1). Let the notation T S;; lF2' stand for the set of solutions, and F = lF2' \ T for the set of non-solutions. Thus ITI = k and IFI = 2n - k. Assume that after r iteration steps, the state is tr L Ix) + fr xET L Ix) . (5.22) xEF The modified query operator VI will then give us -tr L Ix) + fr xET L Ix) , (5.23) xEF and the average of the amplitudes of (5.23) is Operator -HnRnHn will, therefore, transform (5.23) into where tr+l = 2A-tr = (1- ~~)tr+(2- ~~)fr and fr+l = 2A- fr = -~~tr+ (1 - ~~) fr. Collecting all of this together, we find that Gn = - Hn Rn H n VI operates on (5.22) as a transformation (5.24) Therefore, to find out the effect of G n on uniformly weighted initial superposition 1 "fF L Ix), xEIl';' In. we have to solve (5.24) with boundary conditions to = fo = This can be done as follows: it is clear that tT and fT are real and, since G n is a unitary operator, equation (5.25) 5.2 Grover's Amplification Method 85 must hold for each r. That is to say that each point (tr, fr) lies in the ellipse defined by equation (5.25). Therefore, we can write { .fie tr = sinOr, fr = v'2~-k cos Or for some number Or. Recursion (5.24) can then be rewritten as { ~! ) sin Or + 2~ J2n - k.,[k cos Or, 2~ ';2 n - k.,[k sin Or + (1 - ~!) cos Or. sin Or+! = (1 cos Or+! = - (5.26) Recall that k is the number of elements in IF2 that satisfy f(y) = 1, hence, 1- ~! E [-1,1]. Therefore, we can choose wE [0,11"] such that cosw = 1- ~!. Then sinw = 2~ J2n - k.,[k and (5.26) becomes { SinOr +1 = sin (Or +w), cosOr+! = cos(Or +w). The boundary condition gives us that sin 2 (}o that the solution of the recursion is { = 2~' and it is easy to verify .fie tr = sin(rw + (0 ), fr = v'2~-k cos(rw + (0 ), where 00 E [0,11"/2] and w E [0,11"] are chosen in such a way that sin200 = .;. and cosw = 1 - ~!. In fact, then cosw = 1 - 2sin2 <Po = cos 200 , so we have obtained the following lemma. Lemma 5.2.1. Solution of (5.24) with boundary conditions to = fo = is { tr = fr = .fie sin ((2r + 1)00 ), v'2!-k cos ((2r + 1)(}0) , An (5.27) where 00 E [0,11"/2] is determined by sin 2 00 = 2~ . We would like to find a suitable value for r to maximize the probability for finding a solution. Since there are k solutions, the probability of seeing one solution is (5.28) We would then like to find a non-negative r as small as possible such that sin2((2r + 1)00 ) is as close to 1 as possible; i.e., the least positive integer r such that (2r + 1)00 is as close to ~ as possible. We notice that 86 5. Grover's Search Algorithm (2r + 1)00 = and, since 7r -2 <===} ()5 ~ sin 2 ()o = 1 r = -2 7r + -() , 4 0 2':' we have and therefore after approximately iterations, the probability of seeing a desired element y E lF2' is quite close to 1. To be more precise: Theorem 5.2.1. Let f : lF2' -+ lF2 such that there are k elements :.z: E lF2' satisfying f (:.z:) = 1. Assume that 0 < k ::; ~ . 2n, 1 and let ()o E [0, 7r /3) be chosen such that sin2 ()o = 2': ::; ~. After l4;0 J iterations of G n on an initial superposition the probability of seeing a solution is at least i. Proof. The probability of seeing a desired element is given by sin2((2r+1)()0), + 4;0 would a give of probability 1. Thus, we and we just saw that r = -! only have to estimate the error when Clearly for some 8 with 181 ::; ( 2l4;0 -! + 4;0 !. Therefore, J + 1) ()o = ~ + 28()0, so the distance of (2l4;oJ + 1)00 from .2(2 l7rJ ·2 (-7r2 + 1) ()o > sm 40 sm 1 is replaced by l4;0 J . 0 - ~ is 128()0 1::; i. It follows that 7r = -. 1 -) 3 4 If k > ~ . 2n, we can find a desired solution :.z: with a probability of at least ~ just by guessing, and if k = 0, then G n does not alter the initial superposition at all. 5.3 Utilizing Grover's Search Method 87 Grover's method for quantum search is summarized in the following algorithm: Grover's Search Algorithm Input: A blackbox function f: 1F2' -+ 1F2 and k = I{x E 1F2' I f(x) = 1}1· Output: An y E 1F2' such that f(y) exists. 1. If k > = 1 (a solution), if such an element ~ . 2n , choose y E 1F2' with uniform probability and stop. 2. Otherwise compute r = . 2 () SIn 0 = l4~o J, where ()o E [0, 7r /3] is determined by k 2n· 3. Prepare the initial superposition 1 IF v L, .• L "'ElF;' Ix) by using Hadamard transform Hn. 4. Apply operator en r times. 5. Observe to get some y E 1F2'. If k ~ ~ . 2n , then the first step clearly produces correct output with a probability of at least ~. Otherwise, according to Theorem (5.2.1), the algorithm gives a solution with a probability of at least In each case, we can find a solution with nonvanishing probability. i. If k = 1 and n large, then l4~o J ;:;: : ~ IF, so using 0 (IF) queries to f we can find a solution with nonvanishing probability, which is essentially better than any classical randomized algorithm can do. Case k = 2n is also very interesting. Then, sin 2 ()o = so ()o = i. According to (5.28), the probability of seeing a solution after one single iteration of en is i. sin 2 (3()o) i, = sin2 ~ = 1. i. Thus for k = 2n we can find a solution with certainty using en only once, which is clearly impossible using any classical search strategy. In a typical situation we, unfortunately, do not know the value of k in advance. In the next section, we present a simplified version of a method due to M. Boyer, G. Brassard, P. Hoyer and A. Tapp [11] in order to find the required element even if k is not known. 5.3 Utilizing Grover's Search Method 5.3.1 Searching with Unknown Number of Solutions We begin with an elementary lemma, whose proof is left as an exercise. 88 5. Grover's Search Algorithm Lemma 5.3.1. For any real a and any positive integer m, L m-l r=O cos((2r + l)a) = sin(2ma) . 2sma . The following important lemma can be found in [11]. Lemma 5.3.2. Let f : IF;' -t IF2 a blackbox function with k :::: ~. 2n solutions and 00 E [O,~] defined by equation sin2 00 = 2~' Let m be any positive integer and r E [0, m -1] chosen with uniform distribution. If G n is applied to initial superposition 1 '21' v f.'. r L "'ElF;' Ix) times, then the probability of seeing a solution is p. _ ~ _ sin( 4mOo) 2 r - 4msin( 20o)' Proof. In the previous section we saw that, after r iterations of G n , the probability of seeing a solution is sin2((2r + 1)00 ). Thus, if r E [0, m - 1] is chosen uniformly, then the probability of seeing a solution is Pm = ~ m-l L sin2((2r + 1)00 ) r=O 1 L m-l (1 - cos((2r + 1)200 )) 2m r=O 1 sin(4mOo) 2 4m sin(200 ) =- o according to Lemma 5.3.1. Remark 5.3.1. If m > . (1 )' then SIn 200 sin( 4mOo) ::; 1 = sin(~Oo) sin(200 ) ::; msin(200 ), : ; i· i· and therefore 4:~~(:~;) The above lemma implies, then, that Pr ~ This means that if m is great enough, then applying G n r times, where r E [0, m - 1] is chosen uniformly, will yield a solution anyway with a probability of at least i. Assume now that the unknown number k satisfies __ 1_ = 1 = 2n sin(20o) 2 sin 00 cos 00 2Jk(2 n °< k :::: < (2n < J2Ti - k) - Vk - , ~ . 2n. Then 5.3 Utilizing Grover's Search Method 89 so choosing m ? ffn we certainly have m ? sin(~lIo)' This leads to the following algorithm: Quantum Search Algorithm Input: A blackbox function I: lF2' ~ lF2 . Output: Any y E lF2' such that I(y) = 1 (a solution), if such an element exists. 1. Pick an element x E lF2' randomly with uniform distribution. If I(x) = 1, then output x and stop. 2. Otherwise, let m = + 1 and choose an integer r E [0, m - 1] uniformly. 3. Prepare the initial superposition lffnJ 1 V '2n 4" L ",0'2 Ix) by using Hadamard transform Hn and apply Gn r times, 4. Observe to get some y E lF2'. Using the previous lemmata, the correctness probability of this algorithm is easy to analyze: let k be the number of solutions to f. If k > ~. 2n, then the algorithm will output a solution after the first instruction with a probability On the other hand, if k ~ ~, then of at least l > m - > _:-:1-:-:Vrzn k - sin(20o)' and, according to Remark 5.3.1, the probability of seeing a solution is at least 41 anyway. Remark 5.3.2. Since the above algorithm is guaranteed to work with a probability of at least ~, we can say that on average the solution can be found after four attempts. In each attempt, the number of queries to 1 is at most ffn. A linear improvement of the above algorithm can be obtained by emplying the methods of [11]. In any case, the above algorithm results in the following theorem. Theorem 5.3.1. By using a quantum circuit making O( ffn) queries to a blackbox function I, one can decide with nonvanishing correctness probability il there is an element x E lF2' such that 1(x) = 1. In the next chapter we will present a clever technique formulated by R. Beals, H. Buhrman, R. Cleve, M. Mosca, and R. de Wolf [4], which will allow us to demonstrate that the above algorithm is, in fact, optimal up to 90 5. Grover's Search Algorithm a constant factor. Any quantum algorithm that can discover, with nonvanishing probability, whether a solution to f exists uses Sl( 5) queries to f. This result concerns black box functions, so it does not imply a complexity lower bound for computable functions (see Remark 5.1.2). On the other hand, blackbox function f can be replaced with a computable function to get the following theorem. Theorem 5.3.2. By using a quantum circuit, any problem in NP can be solved with nonvanishing correctness probability in time O( 5p( n)), where p is a polynomial depending on the particular problem. In the above theorem, polynomial p( n) is essentially that the nondeterministic computation needs to solve the problem (see Section 2.1.2). Grover's search algorithm has been cunningly used for several purposes. To mention a few examples, see [13] for quantum counting algorithm, and [27] for minimum finding algorithm. In [11] the authors outline a generalized search algorithm for sets other than 1F2. 6. Complexity Lower Bounds for Quantum Circuits Unfortunately, the most interesting questions related to quantum complexity theory, for example, "Is NP contained in BQP or not?", are extremely difficult and far beyond recent knowledge. In this chapter we will mention some complexity theoretical aspects of quantum computation, which may somehow illustrate these difficult questions. 6.1 General Idea A general technique for deriving complexity theoretical lower bounds for quantum circuits using black box function queries was introduced by R. Beals, H. Buhrman, R. Cleve, M. Mosca, and R. de Wolf in [4J. Before going into the details, let us briefly discuss the idea informally. To simplify some notations, we will sometimes interpret an element x E lF2 as a binary number in interval [0,2n - 1J. Assume that a quantum circuit finds out if an arbitrary black box function has some property or not (for instance, in Section 5.1.2 we were interested in whether there was an element y E lF2 such that I(y) = 1). This quantum circuit can then be seen as a device which is given vector (1(0),1(1),···,1(2n - 1)) (the values of f) as an input, and it gives output "yes" or "no" respectively if 1 has the property or not. In other words, the quantum circuit can then be seen as a device for computing a {O, 1}-valued function on input vector (1(0),1(1), ... , 1(2 n - 1)) (unknown to us), which is to say that the circuit is a device for computing a Boolean function on 2n input variables f(O), f(l), ... , f(2 n - 1). Although lot of things about Boolean functions are unknown to us, fortunately, we do know many things about them; see [48J for discussion. It turns out that the polynomial representations of Boolean functions will give us the desired information about the number of black box queries needed to find out how many queries are required to find out some property of f. The bound will be expressed in terms of polynomial representation degree for quantum circuits that exactly (with a probability of 1) compute the property (Boolean 92 6. Complexity Lower Bounds for Quantum Circuits function), and in terms of polynomial approximation degree for circuits that have a high probability of finding that property. We will see that the number of queries to the black box function will bound above the representation degree of a Boolean function computable with a particular quantum circuit. The lower bound for the number of necessary queries is thus established by finding the representation degree of the Boolean function to be computed: the quantum circuit must make enough queries to reach the representation degree. Because the polynomial representations and approximations play such a key role, we will devote the following two sections to study them. 6.2 Polynomial Representations 6.2.1 Preliminaries In this section, we represent the basic facts about polynomial representations of Boolean functions and symmetric polynomials. A reader who is already familiar with these topics may skip this section. Let N = 2n. As discussed in Sections 3.1.1 and 8.2.2, functions F : 1F~ --t C form an 2N -dimensional vector space V over complex numbers. The idea is that Boolean functions on N variables can be viewed as elements of V: they are functions defined in N binary variables getting values in a two-element set {O, I} which can be embedded in C. For each subset K = {kl,k2, ... ,kl}'~ {O,l, ... ,N -I} we consider a function (6.1) which we denote also by XK = X k1 Xk2 ... Xkl and define in a most natural way: the value of function XK on a vector x = (Xo, Xl,"" XN-l) is the product Xk 1 Xk 2 ..... Xkl' which is interpreted as the number or 1. This may require some explanation: x is an element in 1F~, and each of its component Xi is an element of the binary field 1F2 . Thus, the product Xk 1 Xk2 . . . . . Xk 1 also belongs to 1F2, but embedding 1F2 in C, we may interpret the product as also belonging to C. It is natural to call functions X K monomials and to denote X0 = 1 (an empty product having no factors at all is interpreted as 1). If JKJ = l, the degree of a monomial XK = Xk 1 X k2 ... Xk 1 is defined to be deg XK = l. A linear combination of monomials X kp ... , XKs is naturally denoted by ° (6.2) and is called a polynomial. The degree deg P of polynomial (6.2) is defined as the highest degree of a monomial occurring in (6.2), except in the case of zero polynomial p = 0, whose degree we symbolically define to be -00. 6.2 Polynomial Representations 93 A reader may now wonder why we would consider such a simple concept as a polynomial so precisely? The problematic point is that, in a finite field, the same function can be defined by many different polynomials. For example, both elements of the binary field lF2 satisfy equation x2 = x and, thus, nonconstant polynomial x 2 - x + 1 behaves like a constant polynomial 1. We will, therefore, define the product of monomials X K and XL by considering XK and XL as functions lFf --+ lF 2 . It is plain to see that this will result in definition XKXL = XKUL. The product of two polynomials is defined as usual: if and are polynomials, we define s PI P2 = t L L::CidjXkiX1j" i=l j=l The product differs slightly from the ordinary product of polynomials. Take, for example, K = {a, 1} and L = {1, 2}. Then = XOXI X IX2 = XOX IX2 = X{O,I,2}, XKXL but if XOXI and X I X 2 were seen as "ordinary polynomials" over F2 , the product would have been XoXr X2. In fact, the The functions behave just like ordinary polynomials, except all powers above 1 must be reduced to 1. I Lemma 6.2.1. The monomials form a basis ofV. Proof For any y = (Yo, YI, ... ,Y2"-I) E lFf, define a polynomial P1I = (1 + Yo - Xo)(1 + YI - Xl) ... (1 + YN-I - XN-I). By expanding the product, we can get a representation for P1I as a sum of monomials, and for any x = (xo, Xl> . .. ,XN-I) E lFf, {=:> {=:> {=:> P1I (x) = 1 1 + Yi - Xi = 1 for each i Xi = Yi for each i x = y, - For -a reader who is aware of algebraic structures: the polynomial functions conI sidered here form a ring isomorphic to lF2[XO, Xl, ... , XN-Il! I, where I is the ideal generated by all polynomials xl- Xi. 6. Complexity Lower Bounds for Quantum Circuits 94 which is to say that Py is the chamcteristic function of singleton set {y}. But because we can express each characteristic function IFf 4 C as a sum of monomials, we can surely express any function IFf 4 C as a linear combination of monomials (see Section 8.2.2). Therefore, monomials generate V, and since there are different monomials, they are also linearly independent. o It follows from the above lemma that each function IFf 4 IF 2 can be uniquely expressed as a linear combination of monomials (6.1) with complex coefficients. Clearly, all Boolean functions with N variables are included. In fact, we would have also been able to derive such a representation for Boolean functions also directly: Boolean variables can be represented by monomials X o, ... , X N - l , and if Pi and P2 are polynomial representations of some Boolean functions, then ""Pl can be represented as 1 - P b Pi /\ P2 as Pi P2 , and Pi V P2 as 1 - (1 - Pl )(l - P2 ). Since all Boolean functions can be constructed by using ..." /\, and V, (see Section 2.3.1) we have the follwing lemma. Lemma 6.2.2. Boolean functions on N variables have a unique representation as a multivariate polynomial P(Xo, Xl, ... XN-d having real coefficients and degree at most N. We must keep in mind that "polynomial" in the above lemma is interpreted in such a way that X k = X for each k :::: l. In the continuation we will mainly concentrate on a subclass of symmetric Boolean functions that form a subclass of symmetric functions F : IFf 4 C. Symmetric functions are formally defined as follows: Definition 6.2.1. Let x = (xo,Xl, ... ,XN-l) E IFf be any vector and 7r: {O,l, ... ,N -I} 4 {O,l, ... ,N -I} any permutation. Denote 7r(x) = (x 7r (O),X 7r (l), ... ,X7r (N-l»). A function F : IFf 4 C is symmetric if for any vector x E IFf and for any permutation 7r E SN, F(x) = F(7r(x)) holds. Clearly, any linear combination of symmetric functions is again symmetric, thus, symmetric functions form a subspace of W c V. We will now find a basis for W. Let P be a nonzero polynomial representing a symmetric function. Then P can be represented as a sum of homogenous polynomials2 2 A homogenous polynomial is a linear combination of monomials having the same degree. 6.2 Polynomial Representations 95 such that deg Qi = i. Because P itself was assumed to be symmetric, each Qi must also be a symmetric polynomial (otherwise, by permuting the variables, we could get a different polynomial representing the same function as P does, which is impossible). Thus, we can write where the sum is taken over all subsets of {O, 1, ... , N - I} having a cardinality of i. Because Qi is invariant under variable permutations, and because representation as a polynomial is unique, we must have that Ckl,k2,oo.,ki = Ci independent of the choice of {kl, k 2 , •.• , kilo That is to say, symmetric polynomials Vo = 1, Vi =XO+Xl+· .. +XN-l, V2 = XOXl + XOX2 + ... + X n - 2XN-l VN = XOX 1 ··· XN-l generate W. Because they have different degrees, they are also linearly independent. The following triviality is worth mentioning : the value of a symmetric function F(x) depends only on the Hamming weight of x, which is given by wt(x) = I{i I Xi = I}I· We can even quite easily find out the value of Vi(x) when wt(x) = k: consider a polynomial P(T) = (T - Xo)(T - Xt} ... (T - XN-t). Since P is invariant under any permutation of X o, Xl, ... , XN-l, the coefficients of P(T) are symmetric polynomials in Xo, Xl, ... , X N - l . In fact, by expanding P, we find out that the coefficient of TN - i is exactly (-1) i Vi. On the other hand, by substituting 1 for k variables Xj and 0 for the rest, P(T) becomes which tells us that V'i(x) = (7) if i :::; k, and V'i(x) = 0 if i > k. This leads us to an important observation which will be used later. 96 6. Complexity Lower Bounds for Quantum Circuits Lemma 6.2.3. If P is a symmetric polynomial of N variables having complex (resp. real) coefficients and degree d, then there is a polynomial P w on a single variable with complex (resp. real) coefficients such that Pw(wt(x)) = P(x) for each x E IF~. Proof. Because P is symmetric, it can be represented as P = Co Vo + CI VI + ... Cd Vd · The claim follows from the fact that Vi(x) having degree i. = (wtd:ll l ) is a polynomial ofwt(x) 0 6.2.2 Bounds for the Representation Degrees There are many complexity measures for a Boolean function: the number of Boolean gates ...." 1\, and V needed to implement the function (Section 2.3.1) and decision tree complexity, to mention a few. For our purposes, the ideal complexity measure is the degree of the multivariate polynomial representing the function. We follow the notations of [4J in the definition below. Definition 6.2.2. Let B : IF~ -+ {a, I} be a Boolean function on N variables. The polynomial representation degree deg(B) of B is the degree of the unique multivariate polynomial representing B. For future use, we will define polynomial approximations of a Boolean function. Since the interesting results about approximating pertain to polynomials with real coefficients, we will also restrict ourselves to polynomials with real coefficients. Definition 6.2.3. A multivariate polynomial P having real coefficients approximates a Boolean function B : IF~ -+ {a, I} if IB(x) - P(x)1 ~ 1 "3 for each x E IF~. Clearly, there may be many polynomials which approximate a single Boolean function B, but we are interested in the minimum degree of an approximating polynomial. Therefore, following [4], we give the following definition. Definition 6.2.4. The polynomial approximation degree of a Boolean function B is deg(B) = min{ deg(P) I P is a polynomial approximating B}. 6.2 Polynomial Representations 97 It is a nontrivial task to find bounds for the polynomial representation degree, but we can derive a general bound for symmetric Boolean functions (for more bounds, see and [4] and the references within). The following lemma can be found in [4]: Lemma 6.2.4. Let B : IFf -+ {O, 1} be a non-constant symmetric Boolean function. Then deg(B) :?: N/2 + 1. Proof. Let P be the symmetric multivariate polynomial of deg(B) that represents B. By Lemma 6.2.3, there is a single-variate polynomial Pw such that Pw(wt(x)) = P(x) for each x E IFf. As a representation of a Boolen function B, P( x) is either 0 or 1, and since there are N + 1 possible Hamming weights (from 0 to N), and P is not constant, Pw(wt(x)) = 0 for at least N/2 + 1 integers or Pw(wt(x)) = 1 for at least N/2 + 1 integers wt(x). In the former case, we conclude that deg( Pw) :?: N /2 + 1, and in the latter, we conclude that deg( Pw) = deg( Pw - 1) :?: N /2 + 1. Altogether, we find that deg(B) = deg(P) = deg(Pw) :?: N/2 + 1. o For some particular functions, we can, of course, get better results than the previous theorem gives. Consider, for instance, function AND : IFf -+ {0,1} defined as AND(x) = 1 if and only if wt(x) = N. Clearly AND is a symmetric function, and the corresponding single-variate polynomial AND w representing AND has N zeros {O, 1, ... ,N -1}. Therefore, deg(AND) = N. Using a similar argument, we also find that deg(OR) = N, where OR is a function taking a value of 0 if and only if wt(x) = O. It appears much more difficult to find bounds for the degree of an approximating polynomial. R. Paturi has proved [50] the following theorem which characterizes the degree of a polynomial approximating symmetric Boolean functions. To state Paturi's theorem, we fix some notations. Let B : IFf -+ {0,1} be a symmetric Boolean function. Then B can be represented as a single-variate polynomial Bw(wt(x)) = B(x) depending only on the Hamming weight of x. Let r(B) = min{12k - N + 111 Bw(k) =f Bw(k + 1) and O::S k ::s N -1}. Thus, r(B) measures how close to weight N /2 function Bw changes value: if Bw(k) =f Bw(k + 1) for some k approximately N/2, then r(B) is low. Recall that Bw(k) =f Bw(k + 1) means that, if wt(x) = k, then value B(x) changes if we flip one more coordinate of x to 1. Example 6.2.1. Consider function OR, whose only change occurs when the weight of the argument increases from 0 to 1. Thus, r(OR) = N -1. Similarly we see that the only jump for function AND occurs when the weight of x increases from N -1 to N. Therefore, we have that r(AND) = N - 1. 98 6. Complexity Lower Bounds for Quantum Circuits Example 6.2.2. Let PARITY : IF~ -+ {a, I} be a function defined as PARITY(x) = 1, ifwt(x) is divisible by 2 and 0 otherwise. Function PARITY always changes its value when the Hamming weight of x increases. For our choice N = 2n is even, so r(PARITy) = 1, but the theorem also holds if N is odd, and then r(PARITY) = 0. Function MAJORITY : IF~ -+ {a, I} is defined to take a value of 0, if wt(x) < N /2, and 1 if wt(x) 2: N /2. For even N, the only jump occurs when the weight of x passes from N/2 - 1 to N/2, and for odd N when wt(x) increases from (N - 1)/2 to (N + 1)/2; so, r(MAJORITY) = 1 for even N and for odd N. ° Theorem 6.2.1 (R. Paturi, [50]). Let B : IF;' -+ {O, I} be a non-constant symmetric Boolean function. Then deg(B) = 8( IN(N - r(B))). Earlier, we saw that deg(OR) = deg(AND) = N, which means that an exact representation of OR and AND requires a polynomial of degree N. On the other hand, by using previous examples and Paturi's theorem, we find that a polynomial approximating these functions may have essentially lower degree: deg(OR) = 8( ffi), and deg(AND) = 8( ffi). However, deg(PARITY) = 8(N) and deg(MAJORITY) = 8(N), so even an approximating polynomial for these two latter functions has degree Jl(N). 6.3 Quantum Circuit Lower Bound 6.3.1 General Lower Bound We are now ready to present the idea of [4], which connects quantum circuits computing Boolean functions and polynomials representing and approximating Boolean functions. We consider black box functions f : IF;' -+ IF 2, and as earlier, quantum blackbox queries are modelled by using a query operator Q j operating as Qj Ix) Ib) = Ix) Ib EB f(x)) . There may also be more qubits for other computational purposes, but without violating generality, we may assume that a black box query takes place on fixed qubits. Therefore, we can assume that a general state of a quantum circuit computing a property of f is a superposition of states of form Ix) Ib) Iw) , (6.3) where the first register has n qubits, which are used as source bits of a blackbox query, where b E IF 2 is the source qubit, and where w is a string 6.3 Quantum Circuit Lower Bound 99 of some number, say m qubits, which are needed for other computational purposes. There are 2n +m +l different states (6.3), and we can expand the definition of Qf to include all states (6.3) by operating on Iw) as an identity operator; but, we will just denote the enlargened operator again by Qf: Qf Ix) Ib) Iw) = Ix) Ib EB f(x)) Iw). We begin by finding the eigenvectors of Qf. It is easy to verify that, if s is either 0 or 1, then Qf( Ix) 10) Iw) + (-IY Ix) 11) Iw)) = (_I)f("')'8 (Ix) 10) Iw) + (_1)8 Ix) 11) Iw) ), (6.4) which means that Ix) 10) Iw) + (_1)8 Ix) 11) Iw ) (6.5) is an eigenvector of Qf belonging to an eigenvalue (-1 )f("')·8. On the other hand, vector (6.5) can be chosen in 2n +m +l different ways and it is simple to verify that they form an orthogonal set, so (6.5) are all the eigenvectors of Qf. It is clear that if the quantum circuit uses query operator Qf, the state of the quantum circuit depends on the values of f, but equation (6.4) tells us the dependence in a more explicit manner: Qf introduces a factor (-I)f("')'s, which has a polynomial representation 1 - 2sf(x). The next lemma from [4] expresses this dependency more precisely. Lemma 6.3.1. Let Q be a quantum circuit that makes T queries to a blackbox function f : IF2 -+ IF 2. Let N = 2n and identify each binary string in IF2 with the number it represents. Then the final state of the circuit is a superposition of states (6.3), whose coefficients are functions of values Xo = f(O), Xl = f(I), ... , XN-I = f(N -1). These functions have a polynomial representation degree of at most T. Proof. The computation of Q can be viewed as a sequence of unitary transformations Uo, Qf, UI , Qf, U2,··., UT-I, Qf, UT in a complex vector space spanned by all states (6.3). Operators Ui are fixed but Qf' of course, depends on the particular black box function f, so the coefficients are also functions of X o, Xl, ... , XN-I. In the beginning of the previous section we learned that each such function has a unique polynomial representation. We prove the claim on degree by induction. Before the first query, the state of the circuit is clearly a superposition of states (6.3) with constant coefficients. Assume, then, that after k queries and after applying operator Uk, the circuit state is a superposition L L L "'ElF;' bEIF2 wEIF;n P(""b,w) Ix) Ib) Iw) , (6.6) 100 6. Complexity Lower Bounds for Quantum Circuits where each P(oo,b,w) is a polynomial of variables X o, Xl, ... , X N - I having a degree of at most k. Then k + lth query will translate Ix) 10) Iw) into Ix) 11) Iw) and vice versa if Xoo = f(x) = 1, but this translation will not happen if Xoo = f(x) = 0 (again, we interprete x as a binary representation of an integer in [0, N - 1]). This is to say that a partial sum p(oo,O,w) Ix) 10) Iw) + P(oo,l,w) Ix) 11) Iw) (6.7) in (6.6) will transform into (XooP(:z:,I,w) + (XooP(oo,O,w) + (1 + (1 - Xoo)p(oo,O,w)) Ix) 10) Iw) XOO)P(oo,l,w)) Ix) 11) Iw). (6.8) Using the assumption that deg P(oo,b,w) ::::: k, (6.8) directly tells us that, after k + 1 queries to Qf' the coefficients are polynomials of degree at most k + l. Operator Uk+! produces a linear combination of these coefficient functions, D which cannot increase the representation degrees. Remark 6.3.1. The transformation from (6.7) to (6.8) caused by Qf is, of course, exactly the same as (6.4), only it is expressed in a different basis. By utilizing basis (6.5), Farhi & al. [29] have, independently from [4], derived a lower bound for the number of quantum queries needed to approximate PARITY. Remark 6.3.2. Notice that the proof of Lemma 6.3.1 does not utilize the fact that each Ui is unitary; but rather, only the linearity is needed. In fact, if we could use nonlinear operators as well, reader may easily verify that we could have much faster growth in the representation degree of coefficients. The following theorem from [4] finally connects quantum circuits and polynomial representations. Theorem 6.3.1. Let N = 2n , f : 1F2 --+ 1F2 an arbitrary blackbox function and Q a quantum circuits that computes a Boolean function B on N variables Xo = f(O), Xl = f(I), ... , X N - I = f(N - 1). 1. If Q computes B with a probability of 1, using T queries to f, then T;::: deg(B)/2. 2. If Q computes B with ~ correctness probability of at least ~ using T queries to f, then T ;::: deg(B)/2. Proof. Without loss of generality, we may assume that Q gives the value of B by setting the right-most qubit to B(XO,XI, ... ,XN-I), which is then observed. By Lemma 6.3.1, the final state of Q is a superposition LL L ooEiF2' bEiF2 wEiF2' P(oo,b,w) Ix) Ib) Iw) , 6.3 Quantum Circuit Lower Bound 101 where each coefficient PCa>,b,w) is a function of X o, ... , X N - 1 , which has a polynomial representation with a degree of at most T. The probability of seeing 1 on the right-most qubit is given by (6.9) where the sum is taken over all those states where the right-most bit of w is 1. By considering separately the real and imaginary parts, we see that P(Xo, ... ,XN- 1 ), in fact, can be represented as a polynomial of variables X o, ... , XN-l, with real coefficients and degree at most 2T. If Q computes B exactly, then (6.9) is 1 if and only if B is 1, and, therefore, P(Xo, ... ,XN-l) = B. Thus, 2T 2: deg(P) = deg(B), and (1) follows. If Q computes B with a probability of at least ~,then (6.9) is at most apart from 1 when B is 1, and similarly, (6.9) is at most apart from 0 when B is~This means that (6.9) is a polynomial approximating B, so 2T 2: deg(P) 2: deg(B), and (2) follows. 0 i i 6.3.2 Some Examples We can apply the main theorem (Theorem 6.3.1) of the previous section by taking B = OR to arrive at the following theorem. Theorem 6.3.2. If Q is a quantum algorithm that makes T queries to a blackbox function f and decides with a correctness probability of at least ~ if there is an element x E IF2' such that f(x) = 1, then T 2: cV2n, where c is a constant. Proof. To decide if f(x) = 1 for some x E IF2' is to compute function OR on values f(O), f(l), ... , f(2 n - 1). According to Theorem 6.2.1, OR on N = 2n variables has approximation degree of n( VN), so, by Theorem 6.3.1, computing OR on N = 2n variables (with a bounded probability of error) requires n( VN) = n( V2n) queries to f. 0 The above theorem shows that the method of using Grover's search algorithm to decide whether a solution for f exists, is optimal up to a constant factor. Similar results can be derived for other functions whose representation degree is known. Theorem 6.3.3. To decide if a blackbox function f : IF2' -+ IF 2 has an even number of solutions, it requires n(2n) queries to f for any quantum circuit. Proof. Take B = PARITY, apply Paturi's theorem (Theorem 6.2.1), and use reasoning similar to the above proof. 0 Remark 6.3.3. In [4], the authors also derive results analogous to the previous ones for so-called Las Vegas quantum circuits, which must always give the 102 6. Complexity Lower Bounds for Quantum Circuits correct answer, but that can, sometimes, be "ignorant", i.e., give the answer "I do not know", with a probability of at most Moreover, in [4] it is shown that if a quantum circuit computes a Boolean function B on variables X o, Xl, ... , XN-l using T queries to 1 with a nonvanishing error probability, then there is a classical deterministic algorithm that computes B exactly and uses O(T6) queries to I. Moreover, if B is symmetric, then O(T6) can even be replaced with O(T2). !. Remark 6.3.4. It can be shown that only a vanishing ratio of Boolean functions on N variables have representation degree lower than N and that the same holds true for the approximation degree [1]. Thus, for almost any B, it requires n(N) queries to 1 to compute B on 1(0), 1(1), ... , I(N - 1) even with a bounded error probability. Remark 6.3.5. With a little effort, we can utilize the results represented in this section to about oracle Turing Machine computation: for almost all oracles X, Np x is not included in BQPx. However, this does not imply that NP is not included in BQP, but it offers some reason to believe so. On the other hand, blackbox functions are the most "unstructured" examples that one can have - in [21] W. van Dam has an example on breaking the blackbox lower bound by replacing an arbitrary 1 with a more structured function. 7. Appendix A: Quantum Physics 7.1 A Brief History of Quantum Theory We may say that quantum mechanics was born in the beginning of the twentieth century when experiments on atoms, molecules and radiation physics were not explained by classical physics. As an example, we mention the quantum hypothesis by Max Planck in 1900 1 in connection with the radiation of a black body. A black body is a hypothetical object that is able to emit and absorb electromagnetic radiation of all possible wavelengths. Although we cannot manufacture ideal black bodies, for experimental purposes, a small hole in an enclosure is a good approximation of a black body. Theoretical considerations revealed that the radiation intensity on different frequencies 2 should be the same for all black bodies. In the nineteenth century, there were two contradictory theories which described how the curve depicting this intensity should look: the first one was Wien's radiation law; the second one was Rayleigh-Jean's law of radiation. Both of them failed to predict the observed frequency spectrum, i.e., the distribution describing how the intensity of the radiation depends on the frequency. The spectrum of the radiation became understandable when Planck published his radiation law [69]. An especially remarkable feature of Planck's work was that his law of radiation was derived under the assumption that electromagnetic radiation is emitted in discrete quanta whose energy E is proportional to the frequency v: E= hv. (7.1) The constant h was eventually uncovered by studying the e.xperimental data. This famous constant is called Planck's constant and its value is approximately given by h = 6.62608 . 10- 34 J s. 1 2 (7.2) M. Planck introduced his ideas in a lecture at the German Physical Society on December 14, 1900. As a general refrerence to early works on quantum physics, we mention [69]. We could just as well talk about wavelength A instead of the frequency v; these quantities are related by formula VA = c, where c is the vacuum speed of light. 104 7. Appendix A: Quantum Physics Planck's quantum hypothesis contains a somewhat disturbing idea: since the days of Galileo and Newton, there had been a debate in the scientific community on whether light should be regarded as a particle flow or as waves. This discussion seemed to cease in the nineteenth century, mainly for the following reasons. First, in the beginning the nineteenth century, Young demonstrated by his famous two-slit experiment that light inherently has an undulatory nature. The second reason was that research at the end of the nineteenth century showed that light is, in fact, electromagnetic radiation, which is better described as waves. The research mentioned was mainly carried out by Maxwell and Hertz. At least implicitly, Planck again called upon the old question about the nature of the light: the explanation of black body radiation, which finally agreed with observations in accordance with measurement precision, was derived under a hypothesis that light is made of discrete quanta. Later, in 1905, Albert Einstein explained the photoelectric effect, basing the explanation on Planck's quantum hypothesis [69]. The photoelectric effect means that negatively charged metal loses its electrical charge when it is exposed to electromagnetic radiation of a certain frequency. It was not understood why the photoelectric effect so fundamentally depended on the frequency of the radiation and not on the intensity. In fact, reducing the intensity will slow down the effect, but cutting down the frequency will completely stop the effect. Einstein pointed out that, under Planck's hypothesis (7.2), this dependency on frequency becomes quite natural: the radiation should be regarded as a stream of particles whose energy is proportional to the frequency. It should be then quite clear that increasing the frequency of the radiation will also increase the impact of the radiation quanta on the electrons. When the impact becomes large enough, the electrons fly off and the negative charge disappears. 3 Einstein also introduced the notion of the photon, a light quantum. Planck's radiation law and Einstein's explanation for photoelectric effect opened the door for a novel idea: electromagnetic radiation, that was traditionally treated as a wave movement, possesses some characteristic features of a particle flow. The experimental energy spectrum of a hydrogen atom was explained by Niels Bohr [69] up to the measurement precision of 1912. The model concerning hydrogen atoms introduced by Bohr provided some evidence that electrons, which had been considered as particles, may have wave-like characteristics as well. More evidence of the wave-like characteristics of electrons came to light in the interference experiment carried out by C. J. Davidsson and L. H. Germer in 1927. 3 It should be emphasized here that the explanation given by Einstein is represented here only as a historical remark. According to modern quantum physics, the photoelectric effect can also be explained by assuming the undulatory nature of light and the quantum mechanical nature of the matter. 7.2 Mathematical Framework for Quantum Theory 105 Inspired by the previous results, Luis de Broglie introduced a general hypothesis in 1924: particles may also be described as waves4 whose wavelength >. can be uncovered when the momentum is known: (7.3) where h is Planck's constant (7.2). Other famous physicists who have greatly influenced the development of quantum physics in the form we know it today are W. Heisenberg, M. Born, P. Jordan, W. Pauli, and P. A. M. Dirac. 7.2 Mathematical Framework for Quantum Theory In this section, we shall first introduce the formalism of quantum mechanics in a basic form based on state vectors. Later we will also study a more general formalism based on self-adjoint operators, and we will see that the state vector formalism can be produced as a restriction of the more general formalism. The advantage of the state vector formalism is that it is mathematically simpler than the more general one. In connection with quantum computation, we are primarily interested in representing a finite set 5 by using a quantum mechanical system. Therefore, we will make another significant mathematical simplification by assuming that all the quantum systems handled in this chapter are finite-dimensional, unless explicitly stated otherwise. In the introductory chapter, we stated that we can describe a probabilistic system (with a finite phase space) by using a probability distribution (7.4) over all the configurations Xi. In the above mixed state, PI + ... + Pn = 1 and the system can be seen in state Xi with a probability of Pi. A quantum mechanical counterpart of this system is called an n-level quantum system. To describe such a system, we choose a basis IXI), ... , IXn) of an n-dimensional Hilbert space H n , and a general state of an n-Ievel quantum system is described by a vector (7.5) 4 5 "There are not two worlds, one of light and waves, one of matter and corpuscles. There is only a single universe. Some of its properties can be accounted for by wave theory, and others by the corpuscular theory." (A citation of the lecture given by Louis de Broglie at the Nobel prize award ceremony in 1929.) In formal language theory, we also call a finite set representing information an alphabet. 106 7. Appendix A: Quantum Physics is a complex number called the amplitude of Xi and lal1 2 + ... + lan l2 = 1. The basis IXI), ... , IXn) refers to an observable that can have some values, but for a moment we will ignore the values and say that the system may have properties Xl, ... , Xn (with respect to the basis chosen). The probability that the system is seen to have property Xi is lail 2 . Representation (7.5) has some properties that (7.4) does not have. For example, the time evolution of (7.4) sends each basis vector again into a combination of all the basis vectors with non-negative coefficients that sum up to 1. On the other hand, (7.5) may evolve in a very different way, as the basis states can also cancel each other. where each ai Example 7.2.1. Consider a quantum physical description of two-state system having states 0 and 1. Assume also that the time evolution of the system (during a fixed time interval) is given by 1 1 1 1 10) H )210) + )2 11) , 11) H )210) - )21 1). If the system starts in state 10) or 11) and undergoes the time evolution, the probability of observing 0 or 1 will be ~ in both cases. On the other hand, if the system starts at state 10) and undergoes the time evolution twice, the state will be 1 1 1 1 1 1 )2()21 0)+ )21 1))+ )2()21 0)- )211)) 1 1 1 1 = 2"1 0) + 2"1 1) + 2"1 0) - 2"11) = 10) , and the probability of observing 0 becomes 1 again. The effect that the amplitudes of 11) cancel each other, is called destructive interference and the effect that the coefficients of 10) amplify each other is called constructive interference. Destructive interference cannot occur in the evolution of the probability distribution (7.4) since all the coefficients are always non-negative real numbers. A probabilistic counterpart to the quantum time evolution would be 1 1 1 1 [0] H 2"[0] + "2[1], [1] H 2"[0] + "2[1], but the double time evolution beginning with state [0] would give state ~ [0] + ~ [1] as the outcome. Amplitude distribution (7.5) can naturally be interpreted as a unit-length vector of a Hilbert space. Therefore, the following sections are devoted to the study of Hilbert spaces (see also Section 8.3). 7.2 Mathematical Framework for Quantum Theory 107 7.2.1 Hilbert Spaces A finite-dimensional Hilbert space H is a complete (completeness will be defined later) vector space over complex numbers which is equipped with an inner product H x H -+ C, (x, y) t-t (x I y). All n-dimensional Hilbert spaces are isomorphic, and we can, therefore, denote any such space by Hn. An inner product is required to satisfy the following axioms for all x, y, z E Hand Cl, C2 E C: 1. (x I y) = (y I x)*. 2. (x I x) ;::: 0 and (x I x) = 0 if and only if x = O. 3. (x I ClY + C2Z) = Cl(X I y} + C2(X I z}. If E = {el' ... , en} is an orthonormal basis of H, each vector x E H can be represented as x = Xlel + ... xne n . With a fixed basis E, we also use coordinate representation x = (Xl! . .. ,xn ). Using this representation one can easily see that an orthonormal basis induces an inner product by (x I y) = XiYl + ... x~Yn. (7.6) On the other hand, each inner product is induced by some orthonormal basis as in (7.6). For, if (. I .) stands for an arbitrary inner product, we may use the Gram-Schmidt process (see Section 8.3) to find a basis {b l , ... , bn } orthonormal with respect to (. I .). Then (x I y) = (xlbl = + ... + xnbn I ylbl + ... Ynbn) XiYl + ... + x~Yn The inner product induces a vector norm in a very natural way: Ilxll=~· Informally speaking, the completeness of the vector space H means that there are enough vectors in H, i.e., at least one for each limit process: Definition 7.2.1. A vector space H is complete, if for each vector sequence Xi such that lim m,n-+oo IIx.".. - xnll = 0, there exists a vector x E H such that lim IIxn - n-+oo xII = O. An important geometric property of Hilbert spaces is that each subspace W E H which is also a Hilbert space,6 has an orthogonal complement. The proofs of the following lemmata are left as exercises. 6 In finite-dimensional Hilbert spaces, all the subspaces are also Hilbert spaces. 108 7. Appendix A: Quantum Physics Lemma 7.2.1. Let W be a subspace of H. Then the set of vectors W.L = {y E H I (y I x) = 0 whenever x E H.} is also a subspace of H, which is called the orthogonal complement of w. Lemma 7.2.2. If W is a subspace of finite-dimensional Hilbert space H, then H = W EEl W.L . 7.2.2 Operators The modern description of quantum mechanics is profoundly based on linear mappings. In the following sections, we represent the features of linear mappings which are most essential for quantum mechanics. Because we mainly concentrate on finite-level quantum systems, the vector spaces that are treated hereafter will be assumed to have a finite dimension, unless explicitly stated otherwise. Let us begin with some terminology: a linear mapping H -+ H is called an operator. The set of operators on H is denoted by L(H). For an operator T, we define the norm of the operator by IITII = sup IITxll· 11"'11=1 A nonzero vector x E H is an eigenvector of T belonging to eigenvalue A E C, ifTx = AX. Definition 7.2.2. For any operator T : H -+ H, the adjoint operator T* zs defined by the requirement (x I Ty) = (T*x I y) for all x, y E H. Remark 7.2.1. With a fixed basis {el, ... ,e n } of H, any operator T can be represented as an n x n matrix over the field of complex numbers. It is not difficult to see that the matrix representing the adjoint operator T* is the transposed complex conjugate of the matrix representing T. It can be shown that the quantity n Tr(T) = L (ei I Tei) (7.7) i=l is independent of the choice of the orthonormal basis (Exercise 1). Quantity (7.7) is called the trace of operator T. By its very definition it is clear that the trace is linear. Notice that in the matrix representation of T, the trace is the sum of the diagonal elements. 7.2 Mathematical Framework for Quantum Theory 109 Definition 7.2.3. An opemtor T is self-adjoint if T* = T. An opemtor T is unitary if T* = T-l . The following simple lemmata will be used frequently. Lemma 7.2.3. A self-adjoint opemtor has real eigenvalues. Proof. If Ax = >.x, then >'*(x I x) Since x = (>.x I x) = (Ax I x) = (x I Ax) = >.(x I x). o :I 0 as an eigenvector, if follows that >. * = >.. Lemma 7.2.4. The eigenvectors of a self-adjoint opemtor belonging to distinct eigenvalues are orthogonal. Proof. Assume that>. :I >.', Ax = >.x, and Ax' = >.'x'. Since>. and>.' are real by the previous lemma, >.' (x' I x) = (Ax' I x) = (x' I Ax) = >.(x' I x), and therefore (x' I x) = o. o A self-adjoint operator T is positive if (x I Tx) ~ 0 for each x E H. If W is a subspace of Hand H = W EEl W.L, then each vector x E H can be uniquely represented as Xw + Xw J., where Xw E Wand Xw J. E W.L. It is quite easy to verify that the mapping Pw defined by Pw (xw + Xw J.) = Xw is self-adjoint linear mapping, and is called the projection onto subspace W. Clearly, = Pw . On the other hand, it can be shown that each self-adjoint P such that p2 = P is a projection onto some subspace of H (Exercise 2). Notice that a projection can never increase the norm: par The second-last equality, which is called Pythagoms' theorem, follows from the fact that Xw and XWT are orthogonal. Projections play an important role in the theory of self-adjoint operators and so we will study them in more detail. 7.2.3 Spectral Representation of Self-adjoint Operators For any vectors x and y E Hn, we define a linear mapping by Ix)(yl z = (y Ix)(yl: Hn -+ Hn I z)x. It is plain to see that, if IIxll = 1, then I x)(x I is the projection onto the one-dimensional subspace generated by x. To introduce the spectral decomposition of a self-adjoint operator T, we present the following well-known result: 110 7. Appendix A: Quantum Physics Theorem 7.2.1. Let T : Hn -t Hn be a self-adjoint operator. There exists an orthonormal basis of Hn that consists of eigenvectors of T. Proof. The proof is made by induction on n = dim Hn. One-dimensional Hl is generated by some single vector el of unit length. Clearly Tel = Ael for some A E C, so the claim is obvious. If n > 1, let A be an eigenvalue of T and let W be the eigenspace of A, that is, the subspace of Hn generated by the eigenvectors belonging to A. Any linear combination of the eigenvectors belonging to A is again an eigenvector belonging to A, which implies that W is closed under T. But also, the orthogonal complement W.L is closed under T: in fact, take any x E Wand yEW.L. Then (x I Ty) = (Tx I y) = A*(X I y) = 0, which means that Ty E W.L as well. Therefore, we may study the restrictions T : W -t Wand T : W.L -t W.L and apply the induction hypothesis to find an orthonormal basis for Wand W.L consisting of eigenvectors of T. Since Hn = W EB W.L, the union of these bases satisfies the claim. 0 We can now introduce spectral representation, a powerful tool in the study of self-adjoint mappings. Let X1, ... , Xn be orthonormal eigenvectors of a self-adjoint operator T : Hn -t Hn and let A1, ... , An be the corresponding eigenvalues. Numbers Ai are real by Theorem 7.2.1, but they are not necessarily distinct. The set of eigenvalues is called a spectrum and it can be easily verified (Exercise 3) that (7.8) Decomposition (7.8) is called a spectral representation of T. If T has multiple eigenvalues, we say that T is degenerate; otherwise, T is non-degenerate. The spectral representation (7.8) of a non-degenerate T is unique, as is easily verified. Notice that we do not claim that the vectors Xi are unique: only that the projections I Xi)(Xi I are. If T is degenerate, we can collect the multiple eigenvalues in (7.8) as common factors to obtain a representation (7.9) where P l , ... , Pnl are the projections onto the eigenspaces of A~, ... , A~/. It is easy to see that the spectral representation (7.9) is unique. 7 Recall that all the eigenvectors belonging to distinct eigenvalues are orthogonal, which implies that, in representation (7.9), all the projections are projections to mutually orthogonal subspaces. Therefore, PiPj = 0 whenever i i:- j. It follows that, if p is a polynomial, then p(T) = p(A~)Pl 7 + ... + p(A~' )Pnl. (7.10) Some authors call only the unique representation (7.9) a spectral representation, not representation (7.8). 7.2 Mathematical Framework for Quantum Theory 111 We also generalize (7.10): if f : lR ~ C is any function and (7.9) is the spectral representation of T, we define f(T) = f(A~)P1 + ... + f(A~' )Pn ,. (7.11) Example 7.2.2. Matrix M~ = (~~) defines a self-adjoint mapping in H2 as is easily verified. Matrix M~ is nondegenerate having 1 and -1 as eigenvalues. The corresponding orthonormal eigenvectors can be chosen as Xl = ~(1, I)T and X-I = ~(1, -If, for example. The matrix representations of both projections IX±l) (x±ll can be easily uncovered: they are (tt) and ( - t-t) respectively. Thus, the spectral representation of ( 01) 10 = 1· (~~ ~) ~ - 1· M~ is given by (~ _~ -~) ~ . We can now find the square root of M~: JM~ = Vi· (t t) +H' (-t -t)· n Choosing VI = 1, = i we obtain the matrix of Example 2.2.1. By fixing the square roots above in all possible ways, we get four different matrices X that satisfy the condition X 2 = M~. 7.2.4 Spectral Representation of Unitary Operators Let us study a function e iT , which is defined on a self-adjoint operator defined by spectral representation T = Al IX1)(x11 + ... + An IXn)(Xn I. By definition, eiT=eiAllx1)(X11+···+eiA" IXn)(xnl, and we notice that (e iT )* = e- iA1 1Xl) (XII + ... + e- iAn Ixn)(xn 1= (e iT )-l, which is to say that eiT is unitary. We will now show that each unitary mapping can be represented as e iT , where T is a self-adjoint operator. To do this, we first derive an auxiliary result which also has independent interest. 7. Appendix A: Quantum Physics 112 Lemma 7.2.5. Self-adjoint operators A and B commute if and only if A and B share an orthonormal eigenvector basis. Proof. If {a1,"" an} is a set of orthonormal eigenvectors of both A and B, then, by using the corresponding eigenvalues, we can write and Hence, Assume, then, that AB = BA. Let A1, ... , Ah be all the distinct eigenvalues of A and let aik), ... , a~~ be orthonormal eigenvectors belonging to Ak. For any a~k), we have ABa(k) = BAa(k) 1. 1. = BAka(k) 1. = AkBa(k) t , i.e., Ba~k) is also an eigenvector of A belonging to eigenvalue Ak. Therefore, the eigenspace of Ak is closed under B. We denote this subspace by Wk. As a restriction of a self-adjoint operator, clearly B : W k --t W k is also self-adjoint. Therefore, B has mk eigenvectors bik) , ... , b~~ that form an orthonormal basis of Wk. By finding such a basis for each Wk, we obtain an orthonormal eigenvector system such that: • Each b~k) is an eigenvector of B and A (for A, b~k) is an eigenvector belonging to Ak). • If i =I j, then b~k) and b;k) are orthogonal, since they belong to an orthonormal basis generating Wk. • If k =I k', then b~k) and b;k') are orthogonal, since they are eigenvectors of A belonging to distinct eigenvalues Ak and Ak', Thus, the system which is obtained is an orthonormal basis of Hn but also a set of eigenvectors of both A and B. 0 Now let now U be a unitary operator. We notice, first, that the eigenvalues of U have absolute values of 1. To verify this, let x be an eigenvector belonging to eigenvalue A, i.e., U x = AX. Then (x 1 x) = (x 1 U*Ux) = (Ux 1 Ux) = (Ax 1 AX) = IAI2 (x 1 x), and it follows that IAI = 1. We decompose U in "real and imaginary parts" by writing U = A + iB, where A = ~(U + U*) and B = ~(U - U*). Note that A and B are now 7.2 Mathematical Framework for Quantum Theory 113 self-adjoint, commuting operators. According to the previous lemma, we have spectral representations and Since the eigenvalues of U are of absolute value 1, it follows that the eigenvalues of A and B, i.e., numbers Ai and ILi have absolute values of at most l. But since A and B are self-adjoint, numbers Ai and ILi are also real. Thus, U can be written as where Aj and ILj are real numbers in interval [-1,1]. Thus, there exists a unique () j E [0, 27r) for each j such that Aj = cos ()j and ILj = sin ()j. It follows that Aj + iILj = eiOj and U can be expressed as U = e iH , where Representation is called the spectral representation of unitary operator U. Operator H (or -H) is sometimes called the Hamilton operator which induces U. Notice also that the way we derived the spectral representation of a unitary operator gives us, as a byproduct, the knowledge that a unitary matrix has eigenvectors that form an orthonormal basis of Hn. The spectral representation for a unitary operator could, of course, have been derived directly without using decomposition U = A + iB. Anyway, the spectral representation is useful when decomposing a unitary matrix into a product of simple unitary matrices. Notice also that if T1 and T2 commute, then clearly eiCT1 +T2) = eiT1 eiT2 . In the following examples, we will illustrate how the unitary matrices and the Hamiltonians which induce them are related. Example 7.2.3. The Hadamard- Walsh matrix W2 = 1 (11-11) J2 is unitary but also self-adjoint. Thus, the decomposition W 2 = A + iB is just W2 itself. Matrix W2 has two eigenvalues, -1 and 1, and the corresponding orthonormal eigenvectors can be chosen as X-1 = ..j 1 (1 - J2, l)T and 4-2v'2 114 Xl 7. Appendix A: Quantum Physics = ~(I +.J2, If. The corresponding projections can be expressed 4+2V2 as and The spectral representation of a self-adjoint mapping W 2 is, thus, given by which can be also written as W2 = ei.O W2 = e iT , where 1XI)(XI 1+e i. 1X-I)(X-I I, 7r so Example 7.2.4. Let 0 be a real number and (COSO R = sin 0 (J -sinO) cos 0 be the rotation matrix. Clearly, R() is unitary. Now that we know that unitary matrices also have eigenvectors forming an orthonormal basis, we can find them directly without seeking for a decomposition R(J = A + iB. It is an easy task to verify that the eigenvalues of R() are e±i(), and the corresponding eigenvectors can be chosen as X+ = ~ (i, I) and x_ = ~ (-i, If. The corresponding projections are given by and Ix-)(x-I= Thus R() H(J Ii) . (t-I = ei() 1x+)(x+ 1+e- i () 1x_)(x_l= eiHo , where = 0 iO) ( -iO 0 . Example 7.2.5. Let a and (3 be real numbers. The phase shift matrix 7.3 Quantum States as Hilbert Space Vectors 115 is unitary, as is easily verified. The spectral decomposition is now trivial: P.a,{3 = eia (1000) + e (0010) ' i{3 7.3 Quantum States as Hilbert Space Vectors Let us return to the study of finite-level quantum systems. The mathematical description of such a system is based on an n-dimensional Hilbert space Hn. We choose an orthonormal basis {Ixl) , ... , IXn)} and call vectors IXi) basis states. A state of the system is a unit-length vector in Hn. States other than the basis ones are called superpositions of the basis states. Two states Ix) and IY) are equivalent if Ix) = e i8 lY) for some real (). Hereafter, we will regard equivalent states as equal. A general state (7.12) determines a probability distribution: if the system in state (7.12) is observed, the system will be seen in a basis state IXi) with a probability of IO:iI 2 . The coefficients O:i are called amplitudes. We will study the observations in more detail in Section 7.3.2. For now, we present a mathematical description of compound quantum systems: suppose we have n- and m-state distinguishable systems8 with bases {Ixl) , ... , IXn)} of Hn and {IY1),"" IYm)} of Hm respectively. The compound system which is made of these subsystems is described as tensor product Hn ® Hm ~ Hnm. The basis states of Hn ® Hm are where i E {I, ... , n} and j E {I, ... , m}. We also use notations IXi) ® IYj) = IXi) IYj) = lXi, Yj). A general state Iz) of the compound system is a unitlength vector in Hnm. We say that a state Iz) is decomposable, if Iz) = Ix) IY) for some states Ix) E Hn and IY) E Hm. A state that is not decomposable is entangled. The inner product in space Hn ® Hm is defined by 8 The requirement that the systems are distinguishable is essential here. A compound system of two identical subsystems has a different description. 116 7. Appendix A: Quantum Physics Example 7.3.1. A two-level quantum system is called a qubit. The orthonormal basis vectors chosen for H2 are usually denoted by 10) and 11) and are identified with logical 0 and 1. A compound system of m qubits is called a quantum register of length m and is described by a Hilbert space having dimension 2m and basis We can identify the binary sequence Xl, ... , xm and number x l 2m - 1 + + ... + x m - 1 2 + X m . Using this identification, the basis of H 2", can be expressed as X22m-2 {IO), 11), ... , 12m -I)}. 7.3.1 Quantum Time Evolution Our next task is to describe how quantum systems change in time. For that purpose, we will assume, as is usually done, that there is some function Ut : Hn --+ Hn which depends on the time t, and describes the time evolution of the system. 9 In other words, by denoting the state of the system at time t by x(t), x(t) = Utx(O). We will now list some requirements that are usually put on the time-evolution mapping Ut . The very first stress put on Ut is that it should map unit-length states to unit-length states; that is, it should preserve the norm: 1. For each t E JR and each x E Hn, IIUtxl1 = Ilxll. Notice that the above requirement is quite mathematical: it just states that, if the coefficients of x determine a probability distribution as in (7.12), then the coefficients of U x should do so as well. The second requirement, on the other hand, is that the superposition (7.12) should strongly resemble the probability distribution, so strongly that each Ut operates on (7.12) in such a way that each basis state evolves independently: 2. For each t, Ut(al IXI) + ... + an means that each Ut is linear. Ix n )) = aoUt IXI) + ... + anUt Ix n ). This We could seriously argue about the above requirement,lO but that is not the purpose of the present book. The next requirement is that Ut can always be decomposed: 3. For all tl and t2 E JR, UtI +t2 = UtI Ut2 . 9 10 This assumption is referred to as the causality principle. If a state (7.12) is not just a generalization of a probability distribution, why should it evolve like one? 7.3 Quantum States as Hilbert Space Vectors 117 Finally, we require that the time evolution must be smooth: 4. For each to E JR., limt-+to Utx(O) = limt-+to x(t) = x(to). From the above requirements, we can derive the following characteristic result. Lemma 7.3.1. A time evolution mapping Ut satisfying 1-3 is a unitary operator. Proof. Condition 2 states that Ut is an operator, and from condition 1, it follows directly that each Ut is injective. Requirement 3 implies that Uo = US. But Uo Ix) = Us Ix) implies, by the injectivity, that Ix) = Uo Ix) for any Ix), which is to say that Uo is the identity mapping. By again using requirement 3, we see that Ix) = UtU- t Ix), i.e., each Ut is also surjective. According to requirement 1, each Ut preserves norm, and it follows that, by the polarization equation (Exercise 4) (7.13) a linear mapping preserving the norm preserves the inner products, as well. Therefore, which means, by its very definition, that Ut is unitary. o Condition 4 is still unused. Its place is found after the following theorem by M. Stone; see [49J. Theorem 7.3.1. If each Ut satisfies 1-4, then there is a unique self-adjoint operator H such that Recall that the exponential function of a self-adjoint operator A can be defined by using the spectral representation (7.11), or even as series e A 1 1 =I+A+,A +,A + .... 2. 3. 2 3 It is clear that the definitions coincide in a finite-dimensional Hn. By Stone's theorem, the time evolution can be expressed as x(t) = e- itH x(O), from which we get 118 7. Appendix A: Quantum Physics :tX(t) = -iHe-itHx(O) = -iHx(t) by component-wise differentiation. This can also be written as .d zdt x(t) = Hx(t). (7.14) Differential equation (7.14) is known as the abstract Schrodinger equation, and the operator H is called the Hamilton operator of the quantum system. In a typical situation, the Hamilton operator (which represents the total energy of the system) is known, and the Schrodinger equation is used to determine the energy eigenstates, which form an example of the basis states of a quantum system. Remark 7.3.1. The time evolution which is described by operators Ut is continuous, but, in connection to computational aspects, we are interested in the system state at discrete time points tl, t2, t3, .... Therefore, we will regard the quantum system evolution as a sequence of unit-length vectors x, Ulx, U2 UI X, U3 U2 UI X, ... , where each Ui is unitary. 7.3.2 Observables We again study a general state (7.15) The basis states were introduced in connection with the distinguishable properties which we are interested in, and a general state, i.e., a superposition of basis states, induces a probability distribution of the basis states: a basis state Xi is observed with a probability of lai. This will be generalized as follows: Definition 7.3.1. An observable E = {E I , ... Em} is a collection of mutually orthogonal subspaces of Hn such that Hn=EIEB ... EBEm. In the above definition, inequality m ::; n must, of course, hold. We equip the subspaces Ei with distinct real number "labels" (h, ... , em. For each vector X E Hn, there is a unique representation X = Xl + ... +X m (7.16) such that Xi E E i . Instead of observing spaces E i , we can talk about observing the labels i : we say that by observing E, value i will be seen with a probability of Ilxi 112.11 e 11 e The original starting point is, of course, controversial: instead of talking about observing subspaces, one talks about measuring some physical quantity and obtaining a real number as the outcome. However, here it seems to be logically more consistent to introduce these quantity values as labels of subspaces. 7.3 Quantum States as Hilbert Space Vectors Example 7.3.2. The notion of observing the basis states the observable E can be defined as Xi 119 is a special case: where each Ei is the one-dimensional subspace spanned by Xi. We can, for instance, equip Ei with the label i. If the system is in state (7.15), the value i is observed with a probability of Another view of the observables can be achieved in the following way: if Hn = E1 EB ... EB Em, let PEi be the projection on the subspaces Ei . Thus, we may also think of observables as (sometimes partially defined) mappings E : JR --t L(H) that associate a projection E(Oi) = PEi to each label Oi. Following the notation (7.16), and the probability of observing the label Oi when the system state is Ix), is given by Mapping E : JR --t L(H) has other interesting properties: the probability of observing either Oi or OJ is clearly and the probability that some (}i is observed is 1 = (x I Ix). Thus, we may extend E to be defined on sets of real numbers by setting E( {Oi, OJ}) = E(Oi) + E(Oj) if 0i =1= OJ. We can easily see that, as a mapping from subsets of JR to L(H), E satisfies the following conditions: 1. For each X c JR, E(X) is a projection. 2. E(JR) = I. 3. E(UXi ) = E E(Xi ) for a disjoint sequence Xi of sets. A mapping E that satisfies the above conditions is called a projection-valued measure. 12 Viewing an observable as a projection-valued measure represents 12 The notion of a projection-valued measure is not, in general, defined on all subsets of JR, but on a cr-algebra which has enough "regular features". Typically, the Borel subsets of JR will suffice. However, now we are interested in finite-level quantum systems, so there are only finitely many real numbers associated with each observable. 120 7. Appendix A: Quantum Physics an alternative way of thinking than Definition (7.3.1). In that definition, we merely equipped each subspace with a real number, a label, which is thought to be observed instead of the actual subspace. Here we associate a projection which defines a subspace to a set of labels (real numbers). If X is a set of real numbers, then (x I E(X)x) is just the probability of seeing a number in the X. It follows from Condition 3 that, if X and Yare disjoint sets, then the corresponding projections E(X) and E(Y) project onto mutually orthogonal subspaces. In fact, since E(X)(Hn) is a subspace of E(X U Y)(Hn), we must have E(X)E(X U Y) = E(X) and; therefore, E(X) = E(X)E(X U Y) = E(X) + E(X)E(Y), so E(X)E(Y) = o. The third and perhaps the most traditional view of the observables can be achieved by regarding an observable as a self-adjoint operator where E(Oi) = E( {Oi}) are projections to mutually orthogonal subspaces. Recalling that the probability of observing 0i when the system is in state x, is (x I E(Oi)X), we can compute the expected value of an observable A in state x. The expected value is: lE",(A) = 0l(X I E(Odx) + ... + Om (x I E(Om)x) = (x I (OlE(Od + ... + OmE(Om))x) = (x I Ax). It now seems to be a good moment to collect together all the viewpoints of observables which we have so far. • An observable can be defined as a collection of mutually orthogonal subspaces {El , ... , Em} such that H = El EB ... EB Em. Each subspace is equipped with a real number label Oi. This point of view most closely resembles the original idea of thinking about a quantum state as a generalization of a probability distribution. In fact, observing the state of a quantum system can be seen as learning the value of an observable, which is defined as a collection of one-dimensional subspaces spanned by the basis states. • An observable can be seen as a projection-valued measure E which maps a set of labels to a projection that defines a subspace. This viewpoint offers us a method for generalizing the notion of an observable into positive operator-valued measure. • The traditional view of an observable is to define an observable as a selfadjoint operator by the spectral representation All these viewpoints are logically quite equal. In what follows, we will use all of them, choosing the one which is best-suited for each situation. 7.3 Quantum States as Hilbert Space Vectors 121 7.3.3 The Uncertainty Principles In this section, we first present an interesting classical result connected to the simultaneous measurements of two observables. We will view the observabIes as self-adjoint mappings. We say that two observables A and Bare commutative, if AB = BA. We also define the commutator of A and B as a mapping [A, BJ = AB - BA. Recall that, for a given state x, the expected value of observable A is given by lE",(A) = (x I Ax). For short, we denote f.-t = lE",(A). The variance of A (in state x) is the expected value of observable (A - f.-t)2. Notice that, if A is self-adjoint, then (A - f.-t)2 is also self-adjoint, and, thus, it also defines an observable. With these notations, the variance can be expressed as Var",(A) = lE",((A - f.-t)2) = (x I (A - f.-t)2X) = ((A - f.-t)x I (A - f.-t)x) = II(A - f.-t)xI1 2 . (7.17) Theorem 7.3.2 (The Uncertainty Principle). For any observables A, B and any state x, Var",(A)Var",(B) ~ 41 1(x I [A, BJx)1 2 . Proof. Notice first that, for self-adjoint A and B, [A, BJ* = -[A, BJ, so the commutator of A and B can be written as [A, BJ = iC, where C = -irA, B] is self-adjoint. If f.-tA and f.-tB are the expected values of A and B respectively, we get Var",(A)Var",(B) = II(A - f.-tA)xI1 2 11(B - f.-tB)XW 2:: I((A - f.-tA)X I (B - f.-tB)X) 12 using representation (7.17) and the Cauchy-Schwartz inequality. To shorten the notations, we write Al = A - f.-tA and BI = B - f.-tB. Clearly, Al and BI are also self-adjoint and it is a straightforward task to verify that [AI, Bd = [A, B] and that AIBI + BIAI is self-adjoint. Thus, I(AIX I B l x)1 2 = I(x I AI B l x)1 2 = I(x I !(AIBI + BIAt}x + !(AIBI - B I At}x)1 2 1 = 4 1(x I (AIBI + BIAt}x) + (x I [A, B]x)1 1 2 2 = "4 1(x I (AIBI + BIAI)x) + i(x I Cx)1 , where C = -irA, B] is a self-adjoint operator. Because of the self-adjointness of the above operators, both inner products are real. Therefore, Var",(A)Var",(B) ~ ~ I(x I (AIBI + BIAt}x) + i(x I Cx)1 2 1 = 4( (x I (AIBI + BIAI)X)2 + (x I Cx)2) 1 2:: 4(x I Cx)2, 122 7. Appendix A: Quantum Physics which can also be written as 1 Var",(A)Var",(B) ~ "4 1(x I [A, B]x)12 , o as claimed. Remark 7.3.2. A classical example of noncommutative observables are position and momentum. It turns out that the commutator of these two observabIes is a homothety, i.e., multiplication by a nonzero constant, so Lemma 7.3.2 demonstrates that, in any state, the product of the variances of position and momentum has a positive lower bound. This is known as Heisenberg's uncertainty principle. The uncertainty principle (Theorem 7.3.2) was criticized by D. Deutsch [22] on the grounds that the lower bound provided by Theorem 7.3.2 is not generally fixed, but depends on the state x. Deutsch himself in [22] gave a state-independent lower bound for the sum of the uncertainties of two observables. Here we present an improved version of this bound courtesy of Maassen and Uffink [42]. We say that a sequence (PI. ... ,Pn) of real numbers is a probability distribution if Pi ~ 0 and PI + ... + Pn = 1. For a probability distribution P = (PI, ... ,Pn), the Shannon entropy of distribution is defined to be H(P) = -(pI log PI + ... + Pn 10gPn), where 0 . log 0 is defined to be o. For basic properties of Shannon entropy, see Section 8.5. Let A and B be some observables of an n-state quantum system (again regarded as self-adjoint operators). As discussed before, there are orthonormal bases of Hn consisting of the eigenvectors of A and B. By denoting these bases by {aI, ... , an} and {bI. ... , bn }, we get spectral representations of A and B: A = Al lal)(all + ... + An I an) (an I, B = /-Ll Ibl)(bll + ... + /-Ln Ibn)(bn I, where Ai and /-Li are the eigenvalues of A and B, respectively. With respect to a fixed state Ix) E H n , observables A and B define probability distributions P = (PI, ... ,Pn) and Q = (ql, ... , qn) that describe the probabilities of seeing labels AI, ... , An and /-Ll, •.. , /-Ln. Notice that the projections E(Ai) and E(/-Li) are Iai) (ai I and Ibi ) (b i I respectively. Therefore, by measuring observable A when the system is in state Ix), label Ai can be observed with a probability of Pi = (x II ai)(ai I x) = (x I (ai I x)ai) = I(ai I x)1 2 , and by measuring B, label /-Li is seen with a probability of qi = I(bi I x)12. The probability distributions P = (PI, ... ,Pm) and Q = (ql, ... , qm) obey the following state-independent law: 7.3 Quantum States as Hilbert Space Vectors 123 Theorem 7.3.3 (Entropic Uncertainty Relation). H(P) + H(Q) ~ -21ogc, where c = maxi,j I(ai I bj)l· Proof. The expressions H(P) and H(Q) seem somewhat difficult to estimate, so we should find something to replace them. Let r > 0 and study the expression (7.18) We claim that (7.19) In fact, since L~=l Pi = 1, the nominator and numerator of the left-hand side of (7.19) tend to 0 as r does. Therefore, L'Hospital rule applies and 1 (",n · og 11m L...i-l r+l) Pi r r-+O n 1 r+1 '"' r+11ogPi ~Pi L...i=l Pi i=l = l'1m ",n r-+O n 1 = '"' ~Pi OgPi· i=l Notice also that (7.19) implies that n !~ (L pr+ 1) n 1 r = i=l IT pfi , i=l and that the claim is equivalent to exp( -H(P) - H(Q» :5 c2 , which can be written as n n IT pfi IT qli :5 c 2. i=l (7.20) i=l Our strategy is to replace the products in (7.20) with expressions similar to those ones in (7.18), estimate them, and obtain (7.20) letting r -+ O. To estimate (7.18), we use a theorem by M. Riesz [56J:13 If T : Hn -+ Hn is a unitary matrix, y = Tx, and 1 :5 a :5 2, then where c = max ITij I. 13 Riesz's theorem is a special case of Stein's interpolation formula. The elementary proof given by Riesz [56] is based on Holder's inequality. 124 7. Appendix A: Quantum Physics By choosing Xi = (ai I x), Tij = (b i I aj), we have Yi T is unitary. We can now write the theorem in form n (LI(bilx)la~l n a-I 1 )-a (LI(ailxW)-" i=l ~C2:a. = (b i I x). Clearly, (7.21) i=l Since (7.21) can also be raised to some positive power k, we will try to fix values such that (7.21) would begin to resemble the right-hand side of (7.18). Therefore, we will search for numbers r, s, and k such that a~l = 2(s + 1), ka~l = ~, a = 2(r + 1), -~ = ~, and k2-;.a = 2. To satisfy 1 ~ a ~ 2, we must choose r E [-~, 0]. A simple calculation shows that choosing s = - 2r~1' k = _2rt2 will suffice. Thus, (7.21) can be rewritten as n n 1 1 (Lqt+1)" (LP~+1r ~ c i=l 2, i=l and, by letting r ~ 0, we obtain the desired estimation. o Remark 7.3.3. The lower bound of Theorem 7.3.3 does not depend on the system state x, but it may depend on the representation of observables and The independence of the representation is obtained if both observables are non-degenerate. Then, all the vectors ai and bi are determined uniquely up to a multiplicative factor having absolute values of 1; hence c = max I(b i I aj) I is also uniquely determined. Notice also that the degeneration of an observable means that there are less than n values to be observed, so naturally the entropy decreases. Example 7.3.3. Consider a system consisting of m qubits. One orthonormal basis of H2'm. is the familiar computational basis BI = {IXI ... xm) I Xi E {O, I}}, and another one is B2 = {H Ix) I Ix) E Bd which is obtained by applying Hadamard-Walsh transform (see Section 3.1.2 or 8.2.3) H Ix) = - 1 L ...;zm yEIF2' (-l)"'·Yly)· on B I . We will study observables Al and A2 which are determined by BI and B 2 , Ai = L :DEB, c~) Ix)(xl . 7.4 Quantum States as Operators 125 Now we do not care very much about the labels c~); they can be fixed in any manner such that both sequences ~) consist of 2m distinct real numbers. This, in fact, is to regard an observable as a decomposition of H 2", into 2m mutually orthogonal one-dimensional subspaces. For any Ix) E BI and Ix') E B 2 , we have I(x I x')1 = ~, y2 m as is easily verified. Thus, the entropic uncertainty relation gives That is, the sum of the uncertainties of observing an m-qubit state in bases BI and B2 mutually is at least m bits! On the other hand, the uncertainty of a single observation cannot naturally be more than m bits: an observation with respect to any basis will give us a sequence x = Xl ... Xm E {O, l}m with some probabilities of pz such that EZEF2' Pz = 1. The Shannon entropy is given by which achieves the maximum at pz = maximum uncertainty is m bits. 2!.. for each x E {o,l}m. Thus, the The lower bound of the uncertainty principle (Theorem 7.3.2) depends on the commutator [A, B] and on the state Ix). If [A, B] = 0, then the lower is always trivial. Does a similar result hold also for the entropic uncertainty relation? The answer is yes: if A and B commute, then, according to Lemma 7.2.5, A and B share an orthonormal eigenvector basis. But then c = max I(b i I aj) I = 1 and the lower bound is trivial. 7.4 Quantum States as Operators Let us study a system of two qubits in state ~(IO) 10) + 11) 11)). (7.22) In the introductory section we learned that this state is entangled; there is no way to write (7.22) as a product of two single-qubit states. This also means that the representation of quantum systems as Hilbert state vectors does not offer us any vocabulary to speak about the state of the first qubit if the compound system is in state (7.22). In this section, we will generalize the notion of a quantum state. 126 7. Appendix A: Quantum Physics 7.4.1 Density Matrices Let {b l , ... , bn } be a fixed basis of Hn. In this section, the coordinate representation of a vector x = xlb l + ... + xnbn is interpreted as a column vector (Xl' ... ' Xn) T, and we define the dual vector of Ix) as a row vector (xl = (xi, ... , x~). Notice that, for any x = Xl b1 + ... xnbn and y = ylb l + ... + Ynbn, (xlly) as a matrix product looks like Inspired by the word "bracket", we will call the row vectors (xl brUr-vectors and column vectors Iy) ket-vectors.l4 Remark 7.4.1. A coordinate independent notion of dual vectors can be achieved by using the theorem of Frechet and F. Riesz. Any vector x E Hn defines a linear function j", : Hn -t C by j",(y) = (x I y). (7.23) A linear function j : Hn -t C is called a functional. Functionals form a vector space H~, a so-called dual space of H n , where addition and scalar multiplication are defined pointwise. The Frechet-Riesz's theorem states that, for any functional j : Hn -t C, there is a unique vector x E Hn such that j = j",. According to this point of view, the dual vector of x is the functional j", that satisfies (7.23). For any state (vector of unit length) x = xlb l define the density matrix belonging to x by Ix) ® (xl ~ (JJ - ( ®(xi,x;, ... .. '. .' ... E H n , we ,x~) IXll2 XlX:i ... XlX~) X2 Xl* IX2 12 ... X2Xn* ·· · + ... + xnbn . XnX *l Xn X2* ... IXn 12 Remark 7.4.2. For density matrices, there is also a coordinate-independent viewpoint: it is not difficult to verify that Ix) ® (xl is a matrix representing a projection onto the one-dimensional subspace generated by x, and we have 14 This terminology was first used by P. A. M. Dirac. 7.4 Quantum States as Operators 127 already used notation Ix)(x I to stand for such a projection. To shorten the notations, we will omit ® and write Ix) ® (xl =1 x) (x I hereafter. Recall also that we called states Ix) and I(1) equal, if x = ei8x1 for some real number B, but the density matrices are robust against factor ei8 , Le., I (1)(X1 1=1 x)(x I. On the other hand, for a one-dimensional subspace W, we can pick a unit-length vector x which generates W. This vector is unique up to the multiplicative constant e i8 , and the projection onto W is Ix)(x I. Thus, there is a bijective correspondence between the states and the density matrices belonging to unit vectors x. We list three important properties of density matrices 1x)(x I as follows: • Since IIxll = 1, matrices 1x)(x I have unit trace (recall that the trace is the sum of the diagonal elements). • As projections, matrices 1x) (x 1 are self-adjoint. • Matrices Ix)(xl are positive, since (lx)(xI)2 =Ix)(xl. Using these properties, we generalize the notion of a state: Definition 7.4.1. A state of a quantum system is a positive, unit-tmce selfadjoint opemtor in Hn. Such an opemtor is called a density opemtor. Hereafter, any matrix representing a state is called a density matrix as well. Spectral representation offers a powerful tool for studying the properties of the density matrices. If p is a density matrix having eigenvalues AI. ... , An, then there is an orthonormal set {Xl. ... , x n } of eigenvectors of p. Thus, p has a spectral representation (7.24) The spectral representation (7.24) in fact tells us that each state (density matrix) is a linear combination of one-dimensional projections, which are alternative notations for quantum states which are defined as unit-length vectors. We will call the states corresponding to one-dimensional projections vector states or pure states. States that are not pure are referred to as mixed. More information about density matrices can be easily obtained: using the eigenvector basis, it is trivial to see that Al + ... + An = 'fr(p) = 1. Since p is positive, we also see that Ai = (Xi I PXi) ;::: O. Thus, we can say that each mixed state is a convex combination of pure states. Thus, a general density matrix looks very much like a probability distribution of orthogonal pure states. If fact, it is true that each probability distribution (PI, ... ,Pn) of mutually orthogonal vector states Xl. ... , xn always defines a density matrix but the converse is not so clear. For there is one unfortunate feature in the spectral decomposition (7.24). Recall that (7.24) is unique if and only if p 7. Appendix A: Quantum Physics 128 has distinct eigenvalues. For instance, if Al = A2 = A, then it is easy to verify that we can always replace with where x~ =XICOSO-X2sino, x~ = Xl sino + X2 coso, and 0 is a real number (Exercise 5). Therefore, we cannot generally say, without additional information, that a density matrix represents a probability distribution of orthogonal state vectors. We describe the time evolution of the generalized states briefly at first. Afterwards, in Section 7.4.3, we will study the issue of describing the time evolution in more detail. Here we merely use decomposition (7.24) straightforwardly: for a pure state 1 x)(x 1 there is a unitary mapping U(t) such that Ix(t»)(x(t) 1=1 U(t)x(O»)(U(t)x(O) 1= U(t) 1x(O»)(x(O) 1U(t)* (7.25) (see Exercise 6). By extending this linearly for mixed states p, we have p(t) = U(t)p(O)U(t)*, where U(t) is the time-evolution mapping of the system. By using representation U(t) = e- itH , it is quite easy to verify (Exercise 7) that the Schrodinger equation takes the form .d zdtP(t) = [H,p(t)]. Next, we will demonstrate how to fit the concept of observables into mixed states. For that purpose, we will regard an observable as a self-adjoint operator A with spectral representation where each E(()i) =IXi)(Xi 1 is a projection such that the vectors Xl, ... , Xn form an orthonormal basis of Hn. If the quantum system is in a state X (pure state 1 x)(x I), we can write X = CiXl + ... +enxn, where Ci = (Xi 1 X). Thus, the probability of observing a label ()i is 7.4 Quantum States as Operators 129 n (x I E((}i)X) = (I)Xj I X)Xj I E((}i)X) j=l n = L (Xj I E((}i)(X I Xj)X) j=l n = L (Xj I E((}i) Ix)(XI Xj) j=l (7.26) Any state T can be expressed as a convex combination (7.27) of pure states IXi) (Xi I, and we use the linearity of the trace to join the concept of observables together with the notion of states as self-adjoint mappings. Let T be a state of a quantum system and E an observable which is considered as a projection-valued measure. The probability that a real number (label) in set X is observed is Tr(E(X)T) = Tr(TE(X». Notice that, since Tr(T) = 1 and E(X) is a projection, we always have :s: Tr(TE(X» :s: 1. This can be immediately seen by using the linearity of the trace and the spectral representation (7.27) for T: ° Tr(TE(X» = ° Inequalities n n i=l i=l L AiTr(1 Xi)(Xi I E(X» = L Ai(Xi I E(X)Xi)' ° :s: Tr(TE(X» :s: 1 now directly follow from the facts that :s: (Xi I E(X)Xi) :s: 1, Ai 2: 0, and that L~=l Ai = 1. Thus, any state T induces a probability distribution on the set of projections in Hilbert space Hn by f..LT(P) = Tr(T P). If n 2: 3, the converse also holds. Theorem 7.4.1 (A. Gleason, 1957). If dim H 2: 3, then each probability measure f. L : P(H) -+ [0,1J satisfying 1. f..L(0) = 0, f..L(I) = 1 2. f..L(LPi ) = Lf..L(Pi ) for each sequence of orthogonal projections Pi is generated via a state T by f..LT(P) = Tr(T P). The proof of the above theorem can be found in [49J. 130 7. Appendix A: Quantum Physics 7.4.2 Subsystem States We will now study the description of compound systems in more detail. Let Hn and Hm be the state spaces of two distinguishable quantum systems. As discussed before, the state space of the compound system is the tensor product Hn 0 Hm. A state of the compound system is a self-adjoint mapping on Hn 0 Hm. Such a state can be comprised of states of subsystems in the following way: if TI and T2 are states (also viewed as self-adjoint mappings) of Hn and H m , then the tensor product TI0T2 defines a self-adjoint mapping on Hn 0 H m , as is easily seen (TI 0 T2 is defined by for the basis states and extended to Hn0Hm by using linearity requirements). One can also be easily verify that the matrix of TI 0 T2 is the tensor product of the matrices of TI and T 2 . As well, observables of the subsystems make up an observable of the compound system: if Al and A2 are observables if the subsystems (now seen as unit-trace positive self-adjoint mappings), then Al 0 A2 is a positive, unit-trace self-adjoint mapping in Hn 0 Hm. The identity mapping I : H -t H is also a positive, unit-trace self-adjoint mapping and, therefore, defines an observable. However, this observable is a most uninformative one in the following sense: the corresponding projectionvalued measure is defined by E (X) = {I, if 1 E X, 1 0, otherwise , so the probability distribution associated to E1 is trivial: Tr(T Ei (X)) is 1 if 1 E X, 0 otherwise. Notice also that I has only one eigenvalue 1 but all the vectors of H are eigenvectors of I. This strongly reflects to the corresponding decomposition of H into orthogonal subspaces: the decomposition has only one component, H itself. This corresponds to the idea that, by measuring observable I, we are not observing any nontrivial property. In fact, if TI and T2 are the states of Hn and Hm , and Al and A2 are some observables, the it is easy to see that Tr((TI 0 T2)(A I 01)) = Tr(TIA I ), Tr((TI 0 T2)(I 0 A2)) = Tr(T2A2)' That is, observing the compound observable AI0I (resp. I0A 2) corresponds to observing the first (resp. second) system only. Based on this idea, we will define the substates of a compound system. Definition 7.4.2. Let T be a state of a compound system with state space Hn 0 Hm. The states of the subsystems are unit-trace positive self-adjoint mappings TI : Hn -t Hn and T2 : Hm -t Hm such that for any observables Al and A 2 , 7.4 Quantum States as Operators 'fr(T1At) = 'fr(T(A1 ® J)), 'fr(T2A 2) = 'fr(T(J ® A2))' 131 (7.28) (7.29) We say that T1 is obtained by tmcing over Hm , and T2 by tmcing over Hn. We can, of course, be skeptical about the existence and uniqueness of T1 and T2 , but, fortunately, they are always uniquely determined. Theorem 7.4.2. For each state T there are unique states T1 and T2 satisfying {7.28} and {7.29} for all observables A1 and A 2. Proof. We will only show that T1 exists and is unique, the proof for T2 is symmetric. We will first search for a mapping T1 that will satisfy 'fr(T1A) = 'fr(T(A ® J)) (7.30) for any linear mapping A : Hn ---t Hn. For that purpose, we fix orthonormal bases {Xl. . .. , xn} and {Y1, ... , Ym} of Hn and Hm respectively. Using these bases, equation (7.30) can be rewritten as n n m I)Xi 1 T1Axi) = LL(Xi ®Yj 1 T(A®J)(Xi ®Yj)) i=1 j=1 i=1 n = m L L(Xi ® Yj 1 T(Axi ® Yj))· i=1 j=1 (7.31 ) It is easy to see that all the linear mappings Hn ---t Hn form a vector space which is generated by mappings 1Xi)(Xj I. In fact, the matrix representation of mapping 1Xi) (Xj 1 is just the matrix having Os elsewhere but 1 at the intersection of the ith row and the jth column. Therefore, T1 can be represented as n n T1 = LLCij IXi)(xjl, i=1 j=1 (7.32) where Cij are complex numbers. By choosing A =IXk)(XII and substituting (7.32) into (7.31), we obtain m Clk = L(XI ® Yj 1 T(Xk ® Yj)), j=1 which gives n n m T1 = L L L(Xi ® Yk 1 T(Xj ® Yk)) IXi)(Xj 1 i=1 j=1 k=1 as a candidate for T 1 • In fact, mapping (7.33) is self-adjoint: (7.33) 132 7. Appendix A: Quantum Physics n T{ n m 2:2: 2: (T(xj 0Yk) = 1 Xi0Yk) IXj)(Xil i=l j=l k=l n n m 2: L = L(Xj 0 Yk 1 T(Xi 0 Yk)) IXj)(Xi I i=l j=l k=l = Tl · By choosing A = I, we also see that Tr(Tr) = 1. What about the positivity of Tl? Analogously to the formula (7.26) we have (x 1TlX) = Tr(Tl Ix)(xl) = Tr(T(lx)(xI0I)) 2: 0, so Tl is also positive. This proves the existence of the required state T l . Assume then, on the contrary, that there is another state T{ such that Tr(TlA) = Tr(T{ A) for each self-adjoint A. By choosing A to be a projection 1x)(x I, we have that (x 1 (Tl - T{)x) = 0 for any unit-length x. Mapping Tl -T{ is also self-adjoint, so there is an orthonormal basis of Hn consisting of eigenvectors of Tl - T{. If all the eigenvalues of Tl - T{ are 0, then Tl - T{ = 0 and the proof is complete. In the opposite case, there is a nonzero eigenvalue Al of Tl - T{ and a unit-length eigenvector Xl belonging to AI. Then o a contradiction. Let us write Tl = Tr H= (T) for the state obtained by tracing over Hm and, similarly, T2 = TrHn (T) for the state which we get by tracing over Hn. By collecting all facts in the previous proof, we see that n m n Tl = 2:2:2:(Xj 0Yk 1 T(Xi0Yk)) IXj)(Xil· (7.34) IYi)(Yjl· (7.35) i=l j=l k=l Similarly, m m n T2 = LLL(Xk0Yi 1 S(Xk0Yj)) i=l j=l k=l It is easy to see that the tracing-over operation is linear: TrHn (as aTrHn (S) + f3TrHn (T). + f3T) = Example 7.4.1. Let the notation be as in the previous lemma. If S is a state of the compound system Hn 0 Hm, then S is, in particular, a linear mapping Hn 0 Hm -+ Hn 0 Hm. Therefore, S can be uniquely represented as n m n m S= LLLLSrstu Ixr 0Yt)(xs 0Yul r=lt=ls=lu=l (7.36) 7.4 Quantum States as Operators 133 It is easy to verify that IX r 0Yt)(xs 0Yul=lxr)(xsI0IYt)(Yul, so we can write n S = n r=1s=1 n = m m LL Ixr )(xs I0 LLSrstu IYt)(Yul t=1 u=1 n LL Ixr )(xs I0 S rs, (7.37) r=1 s=1 where each Srs E L(Hm) (recall that notation L(Hm) stands for the linear mappings Hm --7 Hm). It is worth noticing that the latest expression for Sis actually a decomposition of an nm x nm matrix (7.36) into an n x n block matrix having m x m matrices Srs as entries. Substituting (7.37) into (7.34) we get, after some calculation, that n m m n S2 = LLL(Yi I SrrYj) IYi)(Yjl= LSrr. r=1i=1j=1 r=1 The above expression can be interpreted as follows: if the density matrix corresponding to the state S is viewed as an n x n block matrix with m x m blocks Srs, then the density matrix corresponding to the state of system S2 can be obtained by summing together all the diagonal blocks Srr. Example 7.4.2. Let us investigate the compound system H2 0 H2 of two qubits. We choose {100) ,101) , 110), Ill)} as the orthonormal basis of H 20H2. Consider the (pure) state Y = ~ (100) + 111)) of the compound system. By using (7.34), we see that the states of the subsystems are S' =~ 10)(01 +~ 11)(11· By using the familiar coordinate representation 100) = (1,0,0, O)T, 101) = (O,l,O,O)T, 110) = (O,O,l,O)T, and 111) = (O,O,O,l)T we have that the coordinate representation of Y is (~, 0, 0, ~ f and it is easy to see that the density matrix corresponding to the pure state Iy) (y I is 1001) °°°° MIY )(t.i="21 ( 0000 ' 100 1 and the density matrices correspoding to states S' are MSI ="21 (10) 01 ' which agrees with the previous example. On the other hand, if we try to reconstruct the state of H2 0 H2 from states S', we find out that 134 7. Appendix A: Quantum Physics MSI 0 MSI 1 1000) 0100 4" ( 0 0 1 0 = ' 0001 which is different from MIY)(YI. This demonstrates that the subsystem states 8' obtained by tracing over are not enough to determine the state of the compound system. Even though the state of the compound system perfectly determines the subsystem states, there is no way to obtain the whole system state from the partial systems without additional information. Example 7.4.3. Let the notation be as before. If z E Hn0Hm is a unit-length vector, then 1 z) (z 1 corresponds to a pure state of system Hn 0 Hm. For each pair (x, z) E Hn x Hn 0 Hm there exists a unique vector Y E Hm such that (y' 1 y) = (x 0 Y' 1 z) for each y' E Hm. In fact, if Yl and Y2 were two such vectors, then (y' Yl - Y2) = 0 for each y' E H rn , and Yl = Y2 follows immediatedly. On the other hand, 1 m Y = 2)x 0 Yk 1 Z)Yk k=l clearly satisfies the condition required. We define a mapping (., .) : Hn x Hn 0 Hm -+ Hm by m (x,z) = L(x0Yk 1 Z)Yk. k=l Clearly (-,.) has properties that resemble the inner product, like linearity with respect to the second component and anti-linearity 15 with respect to the first component. Now, substituting 8 =1 z) (z 1into (7.34), we see that n 82 = L 1 (Xk, Z))((Xk, Z) (7.38) 1 . k=l Comparing (7.38) with equation 15 n n k=l k=l Anti-linearity here, as usual, means that (alX! a2(x2,Z). + a2X2, Z) 7.4 Quantum States as Operators 135 somehow explains the name "tracing over". Notice also that if Z n m n m i=l j=l i=l j=l = 2:: 2:: CijXi 0Yj = 2:: Xi 02:: CijYj, then clearly (Xk, z) n S2 = m = I:;'1 CkjYj and S2 becomes m 2:: l2: ckjYj)(2: ckjYj I· k=l j=l j=l 7.4.3 More on Time Evolution We already know how to interprete abstract mathematical concepts such as state (positive, unit-trace self-adjoint operator) and observable (can be seen as a self-adjoint operator). Since these concepts refer to physical objects, it is equally important to know how the description of the system changes in time. In other words, we should know how to describe the dynamical evolution of the system. We have already outlined in Section 7.3.1 how the time evolution can be described with the state vector formalism. Here we will discuss the time evolution of the general formalism in more detail. In the so-called Schrodinger picture, the state of the system may change in time but the observables remain, whereas in the Heisenberg picture the observables change. It turns out that both pictures are mathematically equivalent, and here we will adopt the Schrodinger picture. We will also adopt, as in Section 7.3.1, the causality principle, which tells us that the state of the system at some time t can be recovered from a state at a given time t = O. Mathematically speaking, this means that if the state of the system Hn at t = 0 is So, there exists a function "\It : L(Hn) -t L(Hn) such that the state at a given time t is St = "\It(So). It follows that each "\It should preserve self-adjointness, trace, and positivity. A requirement potentially subject to more critisism is that each "\It is required to be linear. It is also quite common to require that each "\It is completely positive. To define the concept of a completely positive linear mapping L(Hn) -t L(Hn ), we consider first some basic properties of tensor products. For some natural number m 2: 1, let Hm be a Hilbert space with orthonormal basis {Y1, ... , Ym}. If {Xl, ... , Xn} is also an orthonormal basis of Hn, then {Yi ® Xj 1 i E {1, ... ,m},j E {1, ... ,n}} is a natural choice for an orthonormal basis of Hm 0 Hn. We will keep this notation throughout this section. Each linear mapping WE L(Hm ® Hn) can be uniquely represented as 136 7. Appendix A: Quantum Physics m m w= LL IYr)(Ysl ®Ars , (7.39) r=ls=l where each Ars E L(Hn) (see Example 7.4.1). Whenever W is represented as in (7.39), then knowing each Ars uniquely determines Wand vice versa, since (Yk ® Yi I W(YI ® Xj)) = (Xi I AkIXj), and any linear mapping in Akl E L(Hn) is completely determined by the values (Xi I AkIXj). In fact, this is already quite clear from the fact that (7.39) actually represents an mn x mn matrix W as an m x m matrix with n x n matrices as entries. If now V : L(Hn) -+ L(Hn) is a linear mapping and W is as in (7.39), we define another mapping 1m ® W : L(Hm ® Hn) -+ L(Hm ® Hn) by m (1m ® V)(W) = m LL IYr)(Ys I ® V(Ars). r=ls=l Definition 7.4.3. Mapping V: L(Hn) -+ L(Hn) is completely positive if for any m :::: 1 the extended mapping 1m ® V preserves positivity. Remark 7.4.3. The intuitive meaning of the notion "completely positive" is the following: instead of merely regarding T as a state of a quantum system H n , we could as well introduce an "environment" Hm and regard 1m ® T as a state of a compound system Hm ® Hn. The requirement that V should be completely positive means that the time evolution on a compound system must preserve positivity as well. 7.4.4 Representation Theorems We will derive two representation theorems of completely positive mappings. For that purpose, we begin with some auxilliary results. Lemma 7.4.1. A positive mapping A E L(Hn) is self-adjoint. Proof. Positivity of A means that (x I Ax) :::: 0 for each x E Hn. Specifically, (x I Ax) is real, hence (x I Ax) = (Ax I x) for each x E Hn. On the other hand, also (x I Ax) = (A*x I x) for each x E Hn. Writing B = i(A - A*) we have that (Bx I x) = 0 for any x E Hn. Since B is self-adjoint, there exists an orthonormal basis {x 1, ... , x n } consisting of eigenvectors of Band see the spectral representation (7.8). But then 0= (BXi I Xi) = '>";(Xi I Xi) =.>..; for each i E {I, ... ,n}, and therefore B immediately. O. Equality A A* follows o 7.4 Quantum States as Operators 137 Lemma 7.4.2. A mapping A E L(Hn) is positive if and only if there is an orthonormal basis {Xl, ... , xn} of Hn and non-negative numbers Al, ... , An such that (7.40) Proof. Assume that A has representation (7.40). Each as X E Hn can be written and then (x I Ax) = Al lall 2 + ... + An lan l2 ~ 0, hence A is positive. On the other hand, if A is positive, then A is also self-adjoint by the previous lemma. Using the spectral representation we see that there is an orthonormal {X 1, ... , Xn} basis of H n such that for some real numbers AI, ... , An. Since A is positive, we have that o for each i. The proof of the following theorem, the first structure theorem, is due to [17]. Theorem 7.4.3. Linear mapping V: L(Hn) -t L(Hn) is completely positive if and only if there exists n 2 mappings Vi E L(Hn) such that n2 V(A) =L (7.41 ) ViAVi* i=l for each A E L(Hn). Proof. If V has representation (7.41), we should show that whenever m W = m LL IYr)(YsI0Ars r=ls=l is positive, then also r=ls=l i=l is positive. This is done by straightforward computation: Any can be represented in the form Z E Hm 0 Hn 138 7. Appendix A: Quantum Physics where N ::; n 2 and each Ya E Hrn and Xa E Hn. Writing N Zi=LYa0Vi*xa a=l it is easy to check that (Z 1 (Irn 0 V)(W)z) = L (Zi 1 W Zi) ~ 0, i=l because W is positive. For the other direction, let V be a completely positive mapping. Then particularly In 0 V should be positive. For clarity, we will denote the basis of one copy of Hn by {Y1, ... ,Yn} and that of the other copy by {Xl, ... ,xn}. Mapping W E L(Hn 0 H n ), defined by n n W = L L IYr)(Ys 10I x r)(x s 1 r=ls=l n n r=ls=l n n = ILYr0 Xr)(LYs0 Xsl, r=l s=l is clearly positive, since from the last representation it can be directly read that W is a constant multiple of a one-dimensional projection. Since V is completely positive, it follows that n n (In 0 V)(W) = L L 1Yr)(Ys 10V(1 xr)(x s I) r=ls=l (7.42) is a positive mapping. By Lemma (7.4.2) there exists an orthonormal basis {VI, ... ,Vn2} of Hn 0 Hn and positive numbers >'1, ... , An2 such that r=ls=l ;=1 which can also be written as (In 0 V)(W) = L ;=1 1 AVi)(Av; I· (7.43) 7.4 Quantum States as Operators 139 If n V n = LLVijYj ®a::i j=1i=1 is a vector in Hn ® Hn , we associate to va mapping V" E L(Hn) by n n L L Vij Ia::i)(a::j I . V" = (7.44) i=1 j=1 Then, a straightforward computation gives that n Iv)(vl= n LL IYr)(Ysl ®V" Ia::r)(a::sl V,:. (7.45) r=1s=1 Associating to each yJ4Vi in (7.43), mapping Vi as in (7.44), and using (7.45) we see that n2 (In ® V)(W) = n" n LLL IYr)(Ysl ®Vi Ia::r)(a::sl Vi* i=1 r=1 s=1 n n = L n2 IYr)(Ys I ® L L r=1s=1 Vi Ia::r)(a::s I Vi", i=1 which, together with (7.42), gives that for each pair r, s: V(Ia::r)(a::sl) = LVi Ia::r)(a::sl Vi". i=1 Since mappings Ia::r)(a:: s I span the whole space L(Hn), equation n2 = L l/iAl/i* V(A) i=1 o holds for each A E L(Hn). Another quite useful criterion for complete positiviness can be gathered from the proof of the previous theorem: Theorem 7.4.4. A linear mapping V : L(Hn) tive if and only if n T = L n L IYr)(Ys I ®V(Ia::r)(a::s I) r=1s=1 is a positive mapping Hn ® Hn ~ Hn ® Hn. ~ L(Hn) is completely posi- 140 7. Appendix A: Quantum Physics Proof. If V E L(Hn) is a completely positive mapping, then T is positive by the very definition, since n n I: I: IYr)(Ysl 0 r=l s=l IXr)(xsl is positive. On the other hand, if T is positive, then a representation = V(A) I: V;AV;* ;=1 can be found as in the proof of the previous theorem. o Recall that we also required that V should preserve the trace. Lemma 7.4.3. Let V : L(Hn) -t L(Hn) be a completely positive mapping represented as V(A) 2 n = L;=l ViA V; *. 2 Then V preserves the trace if and only = I. if L':::l V;*V; Proof. Let {Xl, . .. ,xn } be an orthonormal basis of Hn. Since n I = I: Ixr)(xrl, r=l we obtain by direct calculation that n2 (Xk I I: V;*V;Xl) ;=1 r=l i=l i=l r=l ;=1 r=l n2 = Tr(I:V; IX1)(Xkl V;*) = Tr(V(IX1)(Xkl)). i=l n If Li=l V; * V; 2 = I, then 7.4 Quantum States as Operators 141 i.e., V preserves the trace of the mappings I Xl)(Xk I. Since V and the trace are linear and the mappings I Xl)(Xk I generate the whole space L(Hn), V preserves all traces. On the other hand, if V preserves all traces, then 2 L~=l \Ii*\Ii = I, since any linear mapping A I AXl). values (Xk E L(Hn) is determined by the 0 The second structure theorem for completely positive mappings connects them to unitary mappings and offers an interesting interpretation. Theorem 7.4.5. A linear mapping V : L(Hn) -+ L(Hn) is completely positive and trace preserving if and only if there exists a unitary mapping U E L(Hn ® Hn2) and a pure state B =Ib)(bl of system L(Hn2) such that V(A) = TtHn2 (U(A ® B)U*) for each A E L(Hn). Remark 7.4.4. A unitary time evolution I x)(x It-tl Ux)(Ux I can be rewritten as Ix)(xlt-t U Ix)(xi U*, so the above theorem states that any completely positive L(Hn) -+ L(Hn) mapping can be interpreted as a unitary time evolution in a larger system. Proof. Assume first that V(A) = TtHn2 (U(A®B)U*). Let {Xl, ... , xn} and {YI, ... ,Yn2} be orthonormal bases of Hn and Hn2, respectively. We write U* in the form n U* = n L L I Xr)(Xs I®U;s r=ls=l and define Vk E L(Hn) as n n Vk = LL(UjiYk I b) IXi)(xjl· i=l j=l If now n n A= LLaij IXi)(xjl, i=l j=l a direct calculation gives that 142 7. Appendix A: Quantum Physics n n n n VkAVk * = L L L L(xr I AXt)(U;iYk i=l j=l r=l t=l II b)(bl Ut"'jYk) IXi)(Xj I· (7.46) On the other hand, (7.34) gives that TrHn2 (U(A ® B)U* i=l j=l k=l k=li=lj=lr=lt=l = L L L L L ( xr I AXt)(U;iYk k=l i=l j=l r=l t=l Ilb)(bl Ut"'jYk) IXi)(xjl, which, using (7.46), can be written as TrHn2(U(A®B)U*) = LVkAVk*. k=l Thus, by Theorem (7.4.3), V is completely positive. To see that V is trace preserving, notice that a unitary mapping U converts an orthonormal basis B to another orthonormal basis B', so Tr(UTU*) = L (z' I UTU* z') = z;'EB = L (U* z' I TU* z') L z;'EB' (U-IZ' I TU-1z') z;'EB' (z I Tz) =L Since "tracing over" TrHn2 : L(Hn ® Hn2) --+ L(Hn ), T defined by condition for each projection P E L(Hn ), we can use P Tr(TrHn2 (U(A ® B)U*)) = Tr(U(A ® B)U*) = Tr(A ® B) = Tr(T). z;EB H Tl = TrHn2 (T) is = In to obtain = Tr(A)Tr(B) = Tr(A). On the other hand, let V : L(Hn) --+ L(Hn) be a completely positive and trace-preserving linear mapping. By Theorem 7.4.3, V can be represented as n2 V(A) = LViAVi*, i=l where each Vi E L(Hn) and 7.4 Quantum States as Operators 143 2: ViVi* = I. i=l Fix an arbitrary vector b E Hn2 of unit length. For any x E Hn we define n2 U(x0b) = 2:Vix0Yi. (7.47) i=l Clearly U is linear with respect to x. If gives i=l j=l i=l j=l Xl, X2 E Hn, a direct computation i=l i=l Denoting Hn 0 b = {x 0 b 1 x E Hn} ~ Hn 0 Hn2 we see that U : Hn 0 b ---* Hn 0 Hn2 is a linear mapping which preserves inner products. It follows that U can be extended (usually in many ways) to a unitary mapping U' E L(Hn 0Hn2). We fix one extension and denote that again by U. If x is any unit-length vector, then U (I x) (x 10 1b) (b I) U* = U 1x 0 b) (x 0 b 1U* = 1U(x 0 b))(U(x 0 b) 1=12: Vix 0 Yi)(2: Vix 0 Yi i=l i=l Using Example 7.4.3 it is easy to see that I· i=l = 2: Vi Ix)(xl Vi* = V(lx)(xl)· i=l Hence with U defined as in (7.47) and B =1 b)(b I, the claim holds at least for all pure states 1x) (x I. Since linear mappings A H TrHn2 (U(A 0 B)U*) and A H V(A) agree on all pure states and all states can be expressed as convex combinations of pure states, the mappings must be equal. 0 144 7. Appendix A: Quantum Physics 7.5 Exercises 1. Show that the trace function n Tr(A) = L (Xi I AXi) i=l is independent of the chosen orthonormal basis {X 1, ... , x n }. 2. Show that if P is a self-adjoint operator Hn -+ Hn and p 2 = P, then there is a subspace W E Hn such that P = Pw . 3. If >'1, ... , An are eigenvalues of a self-adjoint operator A and Xl, ... , Xn the corresponding orthonormal set of eigenvectors, verify that 4. Verify that the Polarization equation holds. Conclude that if IIAxl1 (x I y) for each x, y E Hn. 5. Prove that =X for any X E H n , then also (Ax lAy) = where = Xl cos a - x2sma, x; = Xl sina + X2 cosa, I Xl . and a is any real number. 6. Prove that I Ax)(By 1= A I x)(y I B* for each x, y operators A, B : Hn -+ Hn. 7. Derive the generalized Schrodinger equation .d t dtP(t) = [H, p(t)] with equation p(t) = U(t)p(O)U(t)* and representation U(t) = e- itH as starting points. E Hn and any 8. Appendix B: Mathematical Background The purpose of this chapter is to introduce the reader to the basic mathematical notions used in this book. 8.1 Group Theory 8.1.1 Preliminaries A group G is a set equipped with mapping G x G, G, i.e., with a rule that unambiguously describes how to create one element of G out of an ordered pair of given ones. This operation is frequently called the multiplication or the addition and is denoted by 9 = g1g2 or 9 = gl + g2 respectively. Moreover, there is a special required element in G, which is called the unit element or the neutral element and it is usually denoted by 1 in the multiplicative notation, and 0 in the additive notation. Furthermore, there is one more required operation, inversion, that sends any element of 9 into its inverse element g-1 (resp. opposite element -g when additive notations are used). Finally, the group operations are required to obey the following group axioms (using multiplicative notations): 1. For all elements gl, g2, and g3, gl(g2g3) 2. For any element g, gl = 19 = g. 3. For any element g, gg-1 = g-lg = 1. = (glg2)g3. If a group G also satisfies 4. For all elements gl and g2, g1g2 = g2g1, then G is called an abelian group or a commutative group. To be precise, instead of speaking about group G, we should say that (G,·, -1,1) is a group. Here G is a set of group elements; . stands for the multiplication; -1 stands for the inversion, and 1 is the unit element. However, ifthere is no danger of confusion, we will just use the notation G for the group, as well as for the underlying set. Example B.l.l. Integers Z form an abelian group having addition as the group operation, 0 as the neutral element and mapping n H -n as the inversion. On the other hand, natural numbers 146 8. Appendix B: Mathematical Background N={1,2,3, ... } do not form a group with respect to these operations, since N is not closed under the inversion n H -n, and there is no neutral element in N. Example B.l.2. Nonzero complex numbers form an abelian group with respect to multiplication. The neutral element is 1, and the inversion is the mapping Z H z-l. Example B.l.3. The set of n x n matrices over C that have a nonzero determinant constitute a group with respect to matrix multiplication. This group, denoted by GLn(C) and called the general linear group over C, is not abelian unless n = 1, when the group is essentially the same as in the previous example. Regarding group axiom 1, it makes sense to omit the parenthesis and write just glg2g3 = gl (g2g3) = (g192)g3. This clearly generalizes to the products of more than three elements. A special case is a product g ... 9 (k times), which we will denote by gk (in the additive notations it would be kg). We also define gO = 1 and g-k = (g-l)k (Og = 0, -kg = k( -g) when written additively). 8.1.2 Subgroups, Cosets A subgroup H of a group G is a group contained in G in the following way: H is contained in G as a set, and the group operations (product, inversion, and the unit element 1 ) of H are only those of G which are restricted to H. If H is a subgroup of G, we write H S; G. Let us now suppose that a group G (now written multiplicatively) has H as a subgroup. Then, for each 9 E G, the set gH = (8.1) {gh I h E H} is called a coset of H (determined by g). Simple but useful observations can easily be made; the first is that each element 9 E G belongs to some coset, for instance in gH. This is true because H, as a subgroup, contains the neutral element 1 and, therefore, also the contains the element 9 = gl E gH. This observation tells us that the cosets of a subgroup H cover the whole group G. Notice that, especially for any hE H, we have hH = H because H, being a group, is closed under multiplication and each element h1 E H appears in hH (h1 = h(h- 1hd)· Other observations are given in the following lemmata. Lemma 8.1.1. glH 1 = g2H if and only if g1"l g2 E H. The unit element can be seen as a nullary operation. 8.1 Group Theory 147 Proof. If g1H = g2H, then by necessity g2 = g1h for some h E H. Hence, g"1 1g2 = h E H. On the other hand, if g"1 1g2 = h E H, then g2 = g1h and so g2H = g1hH = g1H. 0 Remark 8.1.1. Notice that g"1 1g2 E H if and only if g2 1g1 E H. This is true because H contains all inverses of the elements in Hand {g"1 1g2)-1 = g2 1g 1. In the additive notations, the condition g"1 1g2 E H would be written as -g1 + g2 E H. Lemma 8.1.2. Let H be a finite subgroup of G. Then all the cosets of H have m = IHI elements. Proof. By the very definition of (8.1), it is clear that each coset can have at most m elements. If some coset gH has less than m elements, then for some hi i= h j we must have gh i = gh j . Multiplication by g-1, however, gives hi = hj , which is a contradiction. 0 Definition 8.1.1. If g1H = g2H, we say that g1 is congruent to g2 modulo H. Lemma 8.1.3. If g1H i= g2H, then also g1H n g2H = 0. Proof. Assume, on the contrary, that g1H i= g2H have a common element g. Since 9 E g1H n g2H, we must have 9 = g1h1 = g2h2 for some elements h 1, h2 E H. But then g1 = g2h2hi1, and g1H = g2h2hi1 H = g2h2H = g2H, a contradiction. 0 If G is finite and H :::; G, we call the number of the cosets of H the index of H in G and denote this number by [G: HI. Since the cosets cover G, we know by Lemma 8.1.3 that the group G is a disjoint union of [G : HI cosets of H, which all have the same cardinality IHI by Lemma 8.1.2. Thus, we have obtained the following theorem Theorem 8.1.1 (Lagrange). IGI = [G: HI·IHI. Even though Lagrange's theorem was easy to derive, it has very deep implication: the group structure, which does not look very demanding at first glance, is complicated enough to heavily restrict the cardinalities of the subgroups; the subgroup cardinality IHI must always divide IGI. It follows that a group G with a prime number cardinality can only have the trivial subgroups, the one consisting of only the unit element and the group G itself. 8.1.3 Factor Groups A subgroup H :::; G is called normal if, for any 9 E G and h E H, always ghg- 1 E H. Notice that all the subgroups of an abelian group are normal. For a normal subgroup H :::; G we define the product of cosets g1H and g2H by 148 8. Appendix B: Mathematical Background (8.2) But, as far as we know, two distinct elements can define the same coset: g1H = g2H may hold even if gl =f. g2. Can the coset product ever not be well-defined, Le., that the product would depend on the representatives g1 and g2 which are chosen? The answer is no: Lemma 8.1.4. If H is a normal subgroup, g1H then also (glg2)H = (g~g~)H. = g~ Hand g2H = g~H, Proof. By assumption and Lemma 8.1.1, gllg~ = hI E H and g:;1g~ = h2 E H. But since H is normal, we have " ( g1g2 ) -1(, g1g2') = g2-1 gl-1 g1g2 = g2-1h' 192 = g:;1hlg29:;lg~ = g:;lh 1g2h 2 E H. The conclusion that g:;lh 1g 2h 2 E H is due to the fact that g:;lh 1g2 is in H because H is normal. Therefore, by Lemma 8.1.1, (glg2)H = (g~g~)H. 0 The coset product offers us a method of defining the factor group G / H: Definition 8.1.2. Let G be a group and H :S G a normal subgroup. The factor group G / H has the cosets of H as the group elements and the coset product as the group operation. The neutral element of G / H is lH = Hand the inverse of a coset gH is g-1 H. The factor group is also called the quotient group. Using the above terminology, IG/HI = [G: H]. Example 8.1.4. Consider Z, the additive group of integers and a fixed integer n. Then, all the integers divisible by n form a subgroup n?l = {... - 3n, -2n, -n, 0, n, 2n, 3n, ... }, as is easily verified. Any coset of the subgroup nZ looks like k + nZ = {... k - 3n, k - 2n, k - n, k, k + n, k + 2n, k + 3n ... } and two integers k 1, k2 are congruent modulo n?l if and only if kl + nZ = k2 + nZ which, by Lemma 8.1.1, holds if and only if k2 - k1 E n?l, i.e, k2 - k1 is divisible by n. The factor group Z/(nZ) is usually denoted by ?In. Definition 8.1.3. If integers kl and k2 are congruent modulo nZ, we also say that kl and k2 are congruent modulo n and denote k1 == k2 (mod n). Let us pay attention to the subgroups of a finite group G of special type. Pick any element 9 E G and consider the set } { 9 0 ,g 1 ,g 2 , .... (8.3) The set (8.3) is contained in G and must, therefore, be finite. It follows that, for some i > j, equality gi = gj holds, and multiplication by (gj)-1 gives gi- j = 1. This supplies the motivation to the following definition. 8.1 Group Theory 149 Definition 8.1.4. Let G be a finite group and let g E G be any element. The smallest nonnegative integer k such that gk = 1 is called the order of 9 and denoted by k = ord(g). It follows that, if k = ord(g), then gl+km = glgkm = gllm = gl for any integer m. Moreover, it is easy to see that the set (8.3) is an abelian subgroup of G. In fact, the set (8.3) is clearly closed under multiplication and the inverse of an element gl can be expressed as g-l+km, where m is any integer. We say that set (8.3) is a cyclic subgroup of G generated by 9 and denote that by k-I} (g=g,g,g, ) { 0 1 2... ,g . According to Lagrange's theorem, k Thus, we have obtained (8.4) = ord(g) = l(g)1 always divides IGI. Corollary 8.1.1. For each g E G, we have glGI = 1. Proof. Since IGI is divisible by k = ord(g), we can write integer l, and then glGI = gk.1 = 11 = 1. 8.1.4 Group IGI = k . l for some 0 Z~ Consider again the group Zn. Although the group operation is the coset addition, it is also easy to see that the coset product is a well-defined concept. In fact, suppose that kl + nZ = k~ + nZ and k2 + nZ = k2 + nil. Then kl - k~ and k2 - k2 are divisible by n, and also klk2-k~ k2 = (kl -kDk2+k~ (k2 -k 2) is divisible by n. Therefore, k 1 k 2+nZ = k~k2 + nZ. Coset 1 + nZ is clearly a neutral element with respect to multiplication, but the cosets k + nZ do not form a multiplicative group, the reason being that the cosets k + nZ such that gcd(k, n) > 1 do not have any inverse. To see this, assume that k + nZ has an inverse k' + nZ, i.e., kk' + nZ = 1 + nZ. This means then that kk' - 1 is divisible by n and, hence. also divisible by gcd(n, k). But then 1 would also be divisible by gcd(n, k), which is absurd because gcd( n, k) > 1. To remove the problem of non-invertible elements, we define Z~ to be the set of all the cosets k + nZ that have a multiplicative inverse; thus, Z~ becomes a multiplicative group. 2 Previously, we saw that all cosets k + nil with gcd(k, n) > 1 do not belong to il~ and next we will demonstrate that 2 To be more algebraic, the group Z;. is the unit group of ring Zn. The fact that nZ is an ideal of ring Z automatically implies that the coset product is a well-defined concept. 150 8. Appendix B: Mathematical Background consists exactly of cosets k + nZ such that gcd(k, n) = 1 (Notice that this property is independent of the representative k chosen). For that purpose we have to find an inverse for each such coset, which will be an easy task after the following lemma. Z~ Lemma 8.1.5 (Bezout's identity). For any natural numbers x, y there are integers a and b such that ax + by = gcd(x, y). Proof. By induction on M = max{x, y}. If M = 1, then necessarily x = y = 1 and 2 ·1-1·1 = 1 = gcd(l, 1). For the general case we can assume, without loss of generality, that M = x> y. Because gcd(x - y, y) is a divisor of x - y and y, it also divides x, so gcd(x - y, y) :5 gcd(x, y). Similarly, gcd(x, y) :5 gcd(x - y, y) and, hence, gcd(x - y, y) = gcd(x, y). We apply the induction hypothesis to the pair (x - y, y) to get numbers a' and b' such that a'(x - y) + b'y = gcd(x - y, y) = gcd(x, y), which gives the required numbers a = a', b = b' - a'. o If gcd(k, n) = 1, we can use the previous lemma to find the inverse of the coset k + nZ: let a and b be the integers such that ak + bn = 1. We claim that the coset a + nZ is the required multiplicative inverse. But this is easily verified: since ak -1 is divisible by n, we have ak + nZ = 1 + nZ and therefore (a + nZ)(k + nZ) = 1 + nZ. How many elements are there in group Z~? Since all the cosets of nZ can be represented as k + nZ, where k ranges over the set {D, ... , n - I}, we have to find out how many of those k values satisfy the extra condition gcd(k, n) = 1. The number of such values of k is denoted by cp(n) = IZ~I and is called Euler's cp-function. Let us first consider the case that n = pm is a prime power. Then, only the number.s Dp, Ip, 2p, ... , (pm-l - l)p in the set {D, 1, ... ,pm - I} do not satisfy the condition gcd( k, pm) = 1. Therefore, cp(pm) = pm _ pm-I. Especially cp(p) = p - 1 for prime numbers. Assume now that n = nl ... n r , where numbers ni are pairwise coprime, i.e., gcd(ni,nj) = 1 whenever if:. j. We will demonstrate that then cp(n) = cp(nl)'" cp(nr) and, for that purpose, we present a well-known result: Theorem 8.1.2 (Chinese Remainder Theorem). Let n = nl ... nr with gcd(ni' nj) = 1 when i f:. j. Then, for given cosets k i + niZ, i E {I, ... , r} there exists a unique coset k + nZ of nZ such that for each i E {I, ... , r} k + niZ = ki + niZ. Proof. Let mi = n/ni = nl ... ni-lni+1 ... n r . Then, clearly gcd(mi' ni) = 1, and, according to Lemma 8.1.5, aimi + bini = 1 for some integers ai and bi . Let 8.1 Group Theory 151 Then is divisible by ni, since each mj with j :f. i is divisible as well, and aimi -1 is also divisible by ni because aimi +bini = 1. It follows that k+niZ = k i +niZ, which proves the existence of such a coset k + nZ. If also k' + niZ = k i + niZ for each i, then k' - k is divisible by each ni and, since the numbers ni are coprime, k' - k is also divisible by n = nl ... n r . Hence, k' + nZ = k + nZ. o Clearly, any coset k + nZ E Z~ defines r cosets k + niZ E Z~i. But in accordance with Chinese Remainder Theorem, all the cosets k i + niZ E Z~i are obtained in this way. To show that, we must demonstrate that, if we have gcd(ki' ni) = 1 for each i, then the k which is given by the Chinese Remainder Theorem also satisfies gcd(k, n) = 1. But this is straightforward: if gcd(k, n) > 1, then also d = gcd(k', ni) > 1 for some i and some k' such that k = k'k". Then, however, k'k" + niZ = k + niZ (j. Z~. It follows that and therefore, cp(nl)··· cp(nr) = IZ~ll·· ·IZ~J = IZ~l x ... x Z~J = IZ~I = cp(n). Now we are ready to count the cardinality of Z~: let n prime factorization of n, Pi :f. Pj whenever i :f. j. Then cp(n) = cp(p~l) ... cp(p~r) = p~l ... p~r be the (p~l _ p~l-I) ... (p~r _ p~r-I) = p~l (1 - ~) ... p~r(1 PI = - ~) = n(l - ~) ... (1 - ~). PI Pr Pr By expressing the Corollary 8.1.1 on Z~ using the notations of Definition (8.1.3), we have the following corollary. Corollary 8.1.2. If gcd(a, n) = 1, then a'l'(n) This result is known as Euler's theorem. If n and Euler's theorem can be formulated as aP- 1 =: 1 =: 1 (mod n). = P is a prime, then cp(p) = p-l (mod p). Congruence (8.5) is known as Fermat's little theorem. 8.1.5 Group Morphisms Let G and H be two groups written multiplicatively. (8.5) 152 8. Appendix B: Mathematical Background Definition 8.1.5. A mapping f : G ---+ H such that f(glg2) = f(gI)f(g2) for all gl, g2 EGis called a group morphism from G to H. It follows that f(l) = f(1 . 1) = f(l)f(l) and, hence, f(l) = 1. Moreover, 1 = f(l) = f(gg-l) = f(g)f(g-I), which implies that f(g-l) = f(g)-I. By induction it also follows that f(gk) = f(g)k for any integer k. Definition 8.1.6. Let f : G ---+ H be a group morphism. 1. The set Ker(f) = f-l(l) = {g E G I f(g) = I} is called the kernel of f· 2. The set Im(f) = f(G) = {f(g) I 9 E G} is called the image or the range of f· If f(k 1) = f(k 2) = 1, then also f(k 1k 2) = 1 and f(k 1 1) = 1, i.e., Ker(f) is closed under multiplication and inversion, which means that Ker(f) is a subgroup of G. In fact, if k E Ker(f)' then also f(gkg- 1) = f(g)f(k)f(g)-1 = f(g)f(g)-1 = 1, so Ker(f) is even a normal subgroup. Similarly, it can be seen that Im(f) is a subgroup of H, not necessarily normal. A morphism f : G ---+ H is injective if gl =I g2 implies f(gl) =I f(g2), surjective if f(G) = H and bijective if it is both injective and surjective. Injective, surjective and bijective morphisms are called monomorphisms, epimorphisms, and isomorphisms respectively. Two groups G and G' are called isomorphic, denoted by G :::: G', if there is an isomorphism f : G ---+ G'. Notice that two isomorphic groups differ only in notations (any element 9 is replaced with f(g)). An interesting property is given in the following lemma: Lemma 8.1.6. Let f : G ---+ H be a group morphism. Then the factor group G /Ker(f) is isomorphic to Im( H). Proof. First notice that Im(H) is always a subgroup of H and that Ker(f) is a normal subgroup of G, so the factor group G /Ker(f) can be defined. We will denote K = Ker(f) for short and define a function F : G/ K ---+ Im( H) by F(gK) = f(g). The first thing to be verified is that F is well-defined, i.e, the value of F does not depend on the choice of the coset representative g. But this is straightforward: If glK = g2K, then by Lemma 8.1.1, g1 1g2 E K and, by the definition of K, f(g1 1g2) = 1, which implies that f(gI) = f(g2) and F(gIK) = f(gl) = f(g2) = F(g2K). It is clear that F is a group morphism. The injectivity can be seen as follows: if F(gIK) = F(g2K), then by definition, f(gl) = f(g2), hence j(g1 1g2) = 1, which means that g1 1g2 E K. But this is to say, by Lemma 8.1.1, that glK = g2K. It is clear that F is surjective, hence F is an isomorphism. D 8.1 Group Theory 153 8.1.6 Direct Product Let G and G' be two groups. We can give the Cartesian product G x G' a group structure by defining It is easy to see that G x G' becomes a group with (1, I') as the neutral element 3 and (g, g') f-t (g-l, g,-I) as the inversion. The group G x G' is called the (outer) direct product of G and G'. The direct product is essentially commutative, since G x G' and G' x G are isomorphic; an isomorphism is given by (g,g') f-t (g', g). Also, (G 1 XG 2)XG3 ~ G 1 X(G 2 XG 3) (isomorphism given by ((gl,g2),g3) f-t (gb(g2,g3)), so we may as well write this product as G 1 x G 2 X G3. This generalizes naturally to the direct products of any number of groups. If a group G has subgroups HI and H2 such that G is isomorphic to HI x H 2, we say that G is the (inner) direct product of subgroups HI and H 2. We then write also G = HI X H 2. Notice also that in this case, the subgroup HI (and also H 2) is normal. To see this, we identify, with some abuse of notations, G and HI x H2 and HI = {(h, 1) I hE HI}. Then (hI, h2)(h, l)(hb h 2)-1 = (hlhhll, 1) E HI, as required. The following lemma will make the notion of the factor group more natural: Lemma 8.1.7. JfG = HI x H2 (inner) then G/H1 ~ H 2. Proof. Since G = HI X H 2, there is an isomorphism f : G --+ HI X H2 which gives each 9 EGa representation f(g) = (hI, h 2), where hI E HI and h2 E H 2. A mapping p : HI x H2 --+ H2 defined by p(hI, h 2) = h2 is evidently a surjective morphism, called a projection onto H 2. Because f is an isomorphism, the concatenated mapping pf : G --+ H2 is also a surjective morphism. Now p(f(g)) = 1 if and only if f(g) = (hI, 1), which happens if and only if 9 E HI. This means that Ker(pJ) = HI, so by Lemma 8.1.6 we have G/H1 ~ H 2. 0 Example B.l.5. Consider again the group Z~, where n = decomposition of n. It can be shown that the mapping p~l ... p~r is a prime given by the Chinese Remainder Theorem is an isomorphism. 3 Here 1 and I' are the neutral elements of G and G' respectively. 154 8. Appendix B: Mathematical Background 8.2 Fourier Transforms 8.2.1 Characters of Abelian Groups Let G be a finite abelian group written additively. A character of G is a morphism X : G -+ C \ {O}, i.e., each character satisfies the condition X(gl + g2) = X(gl)x(g2) for any elements gl and g2 E G. It follows that X(O) = l. Denoting n = IGI, we also have by Corollary (8.1.1) that X(g)n = x(ng) = X(O) = 1, so any character value is a nth root of unity.4 An interesting property of characters is that for two characters Xl, X2, we can define the product character X1X2 : G -+ C by X1X2(g) = X1(g)X2(g). Moreover, it is easy to see that, with respect to this product, the characters also form an abelian group called the character group or the dual group of G and denoted by C. The neutral element XO of the dual group is called the principal character or the trivial character and is defined by XO (g) = 1 for each element 9 E G. The inverse of a character X is character X- 1 defined by X- 1(g) = X(g)-l. Example 8.2.1. Let us determine the characters of a cyclic group G = {g, 2g, ... , (n - l)g, ng = O}. It is easy to see that any cyclic group of order n is isomorphic to Zn, the additive group of integers modulo n, so we can consider Zn as a "prototype" when studying cyclic groups. For any fixed y E Z we define mapping Xy : Z -+ C by Since e 27ri = 1, Xy has period n, so we can, in fact, consider Xy as a mapping Zn -+ C. Moreover, since Xy(x) = Xx(Y), we can also assume that y E Zn instead of y E Z. Now Xy(x + z) = e 21t'iy(x+z) n 271'ixy 21'!'uy = e-n-e-n- = Xy(x)Xy(z), which means that each Xy is, in fact, a character of Zn. Moreover, if y and z are some representatives of distinct cosets modulo n, then also Xy and Xz are different. Namely, if Xy = Xz, then especially Xy(l) = Xz(l), i.e., e 2:: y = e h;z , which implies that y = z + k . n for some integer k. But then y and z represent the same coset, a contradiction. It is straightforward to see that XaXb = Xa+b, so the characters of Zn also form a cyclic group of order n, generated by Xl, for instance. This can be summarized as follows: the character group of Zn is isomorphic to Zn; hence, the character group of any cyclic group is isomorphic to the group itself. 4 A nth root of unity is a complex number x satisfying xn = l. 8.2 Fourier Transforms 155 The fact that the dual group of a cyclic group is isomorphic to the group itself, can be generalized: a well-known theorem states that any finite abelian group G can be expressed as a direct sum5 (or as a direct product, if the group operation is thought as multiplication) of cyclic groups G i : (8.6) Lemma 8.2.1. Let G be an abelian 9rouP as above. Then G ::: G. Proof By Example 8.2.1 we know that for each i, Gi ::: Gi , so it suffices to demonstrate that G = G1 X ... x Gm. For that purpose, let Xl. ... ,Xm be some characters of G 1 , ... , Gm . Since G = G 1 EB .. . EBGm , each element 9 E G can be uniquely expressed as 9 =91 + ... +9m, where 9i E G i . This allows us to define a function X : G --t C \ {O} by (8.7) It is now easy to see that X is a character of G. Moreover, if X~ =I- Xi, then there is an element 9i E G i such that XH9i) =I- Xi(9i) and, hence, X'(9i) = X1(0) ... X~(9i) ... Xm(O) =I- Xl (0) ... Xi(9i) ... Xm(O) = X(9i), which means that characters of G defined by Equation (8.7) are all different for different choices of Xl. ... , Xm· On the other hand, all the characters of G can be expressed as in (8.7). For, if X is a character of G, we can define Xi by restricting X to G i . It is easy to see that each Xi is a character of G i and that X = Xl ... Xm. 0 As an application, we will find some important characters. Example 8.2.2. Consider 1FT, the m-dimensional vector space over the binary field. Each element in the additive group of 1FT has order 2, so group 1FT has a simple decomposition: IF2 = IF2 EB ... EB IF2. 'm-components ----' Now it suffices to determine the characters of IF2, since by Lemma 8.2.1, characters of 1FT are just the m-fold products of the characters of IF 2 . But the characters of IF2 = Z2 were already found in Example 8.2.1: They are Xy(x) = e 5 21fi:z:y 2 = (_l)XY Group G is a direct sum of subgroups G I , ... , G m if each element of G has unique representation g = gl + ... + gm, where gi E G i . 156 8. Appendix B: Mathematical Background for Y E {O, 1}. Therefore, any character of 1F2' can be written as where x . Y = x XlYl + ... + XmYm = (Xl, ... ,xm ) and Y = (Yl,'" is the standard inner product of vectors ,Ym). 8.2.2 Orthogonality of the Characters Let G = {gl,'" ,gn} be a finite abelian group. The functions f : G --+ C clearly form a vector space V over C, addition and scalar multiplication defined pointwise: (f + h)(g) = f(g) + h(g) and (c· f)(g) = c· f(g) (see also Section 8.3). We also have an alternative way of thinking about V: each function f : G --+ C can be considered as an n-tuple (8.8) so the vector space V evidently has dimension n. The natural basis of V is given by el = (1,0, ... ,0), e2 = (0,1, ... ,0), ... , en = (0,0, ... ,1). In other words, ei is a function G --+ C defined as 1, if i = j ei(gj) = { 0, otherwise. The standard inner prod uct in space V is defined by n (f I h) = L !*(gi)h(gi), (8.9) i=l and the inner product also induces norm in the very natural way: Ilhll = J(hlh). Basis {el,'" en} is clearly orthogonal with respect to the standard inner product. Another orthogonal basis is the characters basis: Lemma 8.2.2. If Xi and Xj are characters of G, then 0, if i f:. j (xiIXj)= { n,ifi=j. (8.10) Proof. First, it is worth noticing that 1 = Ix(g)1 2 = X*(g)X(g), which implies that x*(g) = X(g)-l for any 9 E G. Then (Xi I Xj) n =L k=l n X;(gk)Xj(gk) =L k=l n X;1(9k)Xj(9k) = L(X;lXj)(9k). k=l 8.2 Fourier Transforms 157 If i = j, then XilXj is the principal character and the claim for i = j follows immediately. On the other hand, if i =f=. j, then X = XilXj is a nontrivial character of G and it suffices to show that n S = LX(9k) = 0 k=l for a nontrivial character X. Because of nontriviality, there exists an element 9 E G such that x(g) =f=. 1. Furthermore, mapping 9 H 9 + gi is a permutation of G for any fixed g, so n n n k=l k=l k=l The claim S = 0 follows now from the equation (1 - X(g))S o = O. Since the characters are orthogonal and there are n = IGI of them, they also form a basis. By scaling the characters to the unit norm, we obtain an orthonormal basis B = {B l , ... , B n }, where Other interesting features of the characters can be easily deduced: let us define a matrix X E en x n by By denoting the transposed complex conjugate of X by X*, since Xij Xi(gi), we get n (X* X)ij = L X;kXkj k=l n = L X:(gk)Xj(gk) = (Xi I Xj) k=l = {O,n,lfz=], ~f ~ =f=. j which actually states that X* X = nI, which implies that X- l since any matrix commutes with its inverse, we also have XX* can be written as or as = ~X*. = nI, But which 158 8. Appendix B: Mathematical Background ~ () * () ~ Xk gi Xk gj = k=l {O,n ifif ii =f.= Jj.. (8.11) ' Equations (8.10) and (8.11) are known as the orthogonality relations of characters. Notice also that, by choosing Xj as the principal character in (8.10) and gj as the neutral element in (8.11), we get a useful corollary: Corollary 8.2.1. {n, {n, ~ ( )= ~ X gk k=l ~ () = ~ Xk 9 k=l if X is the principal chamcter 0 otherwise. ' (8.12) if 9 is the neutml element 0 otherwise. ' (8.13) 1F2' we have a nice symmetry: Xi(gj) = Xj(gi). Therefore, the above formulae can be contracted to one formula in both groups. In Zn it becomes In groups Zn and L e 2"ixy n == { yEZn and in n, if x = 0 0, otherwise, 1F2' we have " (-l):Z:.Y ~ yEIF2' = {20,m , otherWIse. if X =? 8.2.3 Discrete Fourier Transform We have now all the tools for defining the discrete Fourier transform: any element f E V (recall that V is the vector space of functions G --+ C) has unique representation with respect to basis B = {JnXI, ... , JnXn}: f = hB1 + ... + fnBn. (8.14) f: Definition 8.2.1. The function G --+ C defined by h where is the coefficient of Bi in the representation (8.14) is called the discrete Fourier tmnsform of f. Because B is an orthonormal basis, we can easily extract any coefficient fi by calculating the inner product of Bi and (8.14): (Bi I f) n n j=l j=l = L(Bi I J;Bj ) = L J;(B i I B j ) =K 8.2 Fourier Transforms 159 Thus, the Fourier transform can be written as (8.15) By its very definition, it is clear that functions f, h and any c E C. -r+h = 1+ Ii and -;;j = c1 for any Example 8.2.3. An interesting corollary of the orthogonality relation (8.11) can be easily derived: IIill 2 = (11 1) = = t; vn 1 n 1 n n L!*(9i)1(9i) i=1 n t;Xi(9k)f*(9k) n 1 n vn {;X:(91)f(91) n = - ~~f*(9k)f(91) ~Xi(9k)X:(9!) n 1 = ;;: k=11=1 L n i=1 f* (9k)f(9k)n = (J I J) = IIfll2. k=1 The equation IIill = IIfil thus obtained is known as Parseval's identity. Example 8.2.4. Let G = are all of form ::En. As we saw in Example (8.2.1), the characters of ::En 27ri:r:y Xy(x) = e n , and, therefore, the Fourier transform of a function ~ f(x) = f : ::En ~ C takes form 2",,,y vn1 """"' ~ ef(y)· n yEZ" Example 8.2.5. The characters of group G = IFr are so the Fourier transform of f i(x) = ~ L v2··' lIEF2' : IFr ~ C looks like (-l)"'·lIf(y). This Fourier transform in IFr is also called a Hadamard transform, Walsh transform, or Hadamard- Walsh transform. 160 8. Appendix B: Mathematical Background 8.2.4 The Inverse Fourier Transform Notice that Equation (8.15) can be rewritten as (8.16) and that the matrix appearing in the right-hand side of (8.16) is X*, the transpose complex conjugate of matrix X defined as X ij = Xj(gi).6 By multiplying (8.16) by X we get (8.17) which gives the idea for the following definition. Definition 8.2.2. Let f : G -+ C be a function. The inverse Fourier transform of f is defined to be (8.18) Keeping equations (8.16) and (8.17) in mind, it is clear that f = 1= f. Example 8.2.6. In lEn we have Xx(y) = Xy(x), so the inverse Fourier transform is quite symmetric to the Fourier transform: !(x) = ~ L e 2"~xy f(y). yE'Zn In group - f(x) 6 lFi the symmetry is perfect: = /'i\ffi L.J (-l)""Yf(y) 1" v 2··· Y ElF'" ~ = f(x). 2 Equation (8.16) makes Parseval's identity even clearer: matrices are unitary and, therefore, they preserve the norm in V. JnX and JnX· 8.3 Linear Algebra 161 8.2.5 Fourier Transform and Periodicity We will now give an illustration on how powerfully the Fourier transform can extract information about the periodicity. Let f : G -t C be a function with period pEG, i.e., f(g + p) = f(g) for any 9 E G. Then ~ 1 n f(gi) = y'ri LX:(9k)f(9k) k=l 1 n = y'riLXi(9k+P-p)*f(9k+P) k=l = X:( -p) In t X:(gk + p)f(gk + p) k=l = X:( -p)f(gi) , which implies that f(gi) = 0 whenever X: (-p) = Xi(-p)-IXi(P) =/:-1. Example 8.2. 7. In group lEn the above equation takes form - f(x) = e- 27ri:x:pn f(x), • - 2wia:p. whIch means that f(x) = 0 whenever e- n =/:- 1, whIch happens exactly when xp is not divisible by n. If also gcd(p, n) = 1, then -xp can be divisible by n and is so if and only if x is. 8.3 Linear Algebra 8.3.1 Preliminaries A vector space over complex numbers is an abelian group V (usually written additively) that is equipped with scalar multiplication, which is a mapping C x V -t V. The scalar multiplication is usually denoted by (c,x) H cx. It is required that the following vector space axioms must be fulfilled for all C,CI,C2 E C and X,XI,X2 E V: 1. CI(C2X) = (CIC2)X. 2. (CI + C2)X = CIX + C2X. 3. C(XI + X2) = CXI 4.1x = x. + CX2. Again, to be precise, we should talk about vector space (V, +, 0, -, C,·) instead of space V, but to simplify the notations we identify the space and the underlying set V. The elements of V are called vectors. 162 8. Appendix B: Mathematical Background Example 8.3.1. Let V en equipped with sum = en, the set of n-tuples over complex numbers. Set zero element (0, ... ,0) and inversion is evidently an abelian group. This set equipped with scalar multiplication becomes a vector space, as is easily verified. Example 8.3.2. The set of functions x : N ~ e, for which the series 00 is convergent, also constitutes a vector space over Co The sum and scalar product are again defined pointwise: for functions x and y, (x + y)(n) = x{n) + y(n) and (cx)(n) = cx(n). To simplify the notations, we write also Xn = x{n). To verify that the sum of two elements stays in the space, we also have to check that the series L:~l IX n + Ynl 2 is convergent whenever L:~=l IXnl2 and L:~=l IYnl 2 are convergent. For this purpose, we can use estimations Ix + Yl2 = IxI 2+ 2Re{x*y) + lyl2 :S Ixl2 + 21xllyl + lyl2:S 21xl2 + 21yl2 for each summand. The vector space of this example is denoted by £2(C). Notice that the domain N in the definition could be replaced with any numerable set. If, instead of N, there is a finite domain {I, ... , n}, the convergence condition would be unnecessary and the space would become essentially the same as en. Definition 8.3.1. A linear combination of vectors Xl, ... , Xn is a finite sum + ... + CnX n . A set S c V is called linearly independent if C1Xl implies Cl = ... = Cn = 0 whenever Xl, independent is linearly dependent. ... , Xn E S. A set that is not linearly Definition 8.3.2. For any set S ~ V, £(S) is the set of all linear combinations of vectors in S. By its very definition, set £(S) is a vector space contained in V. We say that £(S) is generated by S and also call £(S) the span of S. 8.3 Linear Algebra 163 The proof of the following lemma is left as an exercise. Lemma 8.3.1. A set 8 <; V is linearly dependent if and only if some vector x E 8 can be represented as a linear combination of vectors in 8 \ {x}. In ligth of the previous lemma, a linearly dependent set contains some "unnecessary" elements in the sense that they are not needed when making linear combinations. For, if x E 8 can be expressed as a linear combination of 8\ {x}, then clearly L(8\ {x}) = L(8) (cf. Exercise 1). Definition 8.3.3. A set Be V is a basis of V if V independent. = L(B) and B is linearly It can be shown that all the bases have the same cardinality (See Exercise 2). This gives the following definition. Definition 8.3.4. The dimension dim (V) of a vector space V is the cardinality of a basis of V. Example 8.3.3. Vectors el (0,0, ... , 1) form a basis of has dimension n. = (1,0, ... ,0), e2 = cn . This is a so-called (0,1, ... ,0), ... , en = natural basis. Thus C n , Example 8.3.4. As in the previous example, we could define ei E L 2 (C) by I, if i = n ei(n) = { 0, if i -# n for each i E N. It is clear that the set e = {el, e2, e3, ... } is linearly independent, but it is not a basis of L 2 (C) in the sense of definition (8.3.3), because there are vectors in L 2 (C) that cannot be expressed as a linear combination of vectors in e. This is because, according to definition (8.3.1), a linear combination is a finite sum of vectors (Exercise 4). Definition 8.3.5. A subset W <; V is a subspace of V if W is a subgroup of V that is closed under scalar multiplication. Example 8.3.5. Consider the n-dimensional vector space C n . Set is clearly a subspace of n-l. cn having el, ... en-l as basis. Hence, dim(W) = Example 8.3.6. Let W = {x E L 2 (C) I Xn -# 0 only for finitely many n EN}. It is straightforward to see that W is a subspace of L 2 (C) and that the set of example (8.3.4) is a basis of W. e 164 8. Appendix B: Mathematical Background 8.3.2 Inner Product A natural way to introduce geometry into a complex vector space V (a vector space over C n ) is to define the inner product. Definition 8.3.6. An inner product on V is a mapping Vx V -+ C, (x, y) H (x I y) that satisfies the following conditions for any C E C and any x, y, and z E V 1. (x 2. (x 3. (x I y) = (x I y)*. I x) ;::: 0 and (x I x) = 0 if and only if x = I clY + C2Z) = Cl(X I y) + C2(X I z). O. In axiom 1 and hereafter, * means complex conjugation. Notice that by axiom 1 it follows that (x I x) is always real, so axiom 2 makes sense. From axioms 1 and 3 it follows that A vector space equipped with an inner product is also called an inner product space. Example 8.3.7. For vectors x = (Xl, .. . , x n), and Y = (Yl,"" Yn) in formula (x I y) cn the = XiYl + ... + x~Yn defines an inner product, as is easily verified. An inner product of x and y E £2(([;) can be defined by formula (8.19) since the series (8.19) is convergent. This is because and the series L~=l IXn l2 and L~=l IXnl2 converge by the very definition of £2(([;). Definition 8.3.7. Vectors x and yare orthogonal if (x I y) = O. Two subsets Sl ~ V and S2 ~ V are mutually orthogonal if all Xl E Sl and X2 E S2 are orthogonal. A set S ~ V is orthogonal if Xl E Sand X2 E S are orthogonal whenever Xl =I X2 8.3 Linear Algebra 165 An orthogonal set that does not contain the zero vector 0 is always linearly independent: for, if S is such a set and is a linear combination of vectors in S, then 0) = 0, which implies Ci = 0 since Xi =1= o. Example 8.3.8. Let W Wl. = {x ~ Ci(Xi 1 Xi) = (Xi 1 CiXi) = (Xi 1 V be a subspace. Then also E V 1 (x 1 y) = 0 for all YEW} is a subspace of V, the so-called orthogonal complement of W. Lemma 8.3.2 (Cauchy-Schwartz inequality). For any vectors x, y E V, I(x 1 y)1 2 S (x 1 x)(y 1 y). Proof. If y = 0, then (x 10) = (x 10 + 0) = (x 1 0) + (x 1 0), so (x 10) = 0 and the claim holds with equality. If y =1= 0, we can define A = - i~I:;. Then, by the inner product axiom 2, Os (x + AY 1x + AY) = (x 1x) + A(X 1y) + A*(Y 1x) + IAI2 (y 1y), which can also be written as oS (x 1 x) - ~: : :~ (x 1 y), and the claim follows. D Definition 8.3.8. A norm on a vector space is a mapping V such that, for all vectors x and y, and c E C we have l·lIxll 2: 0 and IIxll = 0 if and only if x 2·lIcxll = Iclllxll· 3. IIx + yll S IIxll + lIyll· = ~ JR., x H Ilxll o. A vector space equipped with a norm is also called a normed space . Any inner product always induces a norm by IIxll = vTxTX), so an inner product space is always a normed space as well. To verify that vTxTX) is a norm, it is easy to check that the conditions 1 and 2 are fulfilled; and for condition 3, we use the Cauchy-Schwartz inequality, which can also be stated as I(x I y)1 IIxlillyli. Thus, s IIx + Yll2 = (x s + y I x + y) = (lIxll + IIY11)2. + 2Re(x I y) + lIyW + lIyl12 s IlxW + 211xlillyll + lIyll2 = IIxl12 II x ll 2 + 21(x I y)1 166 8. Appendix B: Mathematical Background Any norm can be used to define the distance on V: d(x, y) = Ilx - YII· One can easily verify that d : V x V -t lR satisfies the axioms of a distance function: For each x, y, and Z E V, 1 d(x,y)=d(y,x)::::O. 2 d(x, y) = 0 if and only if x 3 d(x, z) ::; d(x, y) + d(y, z). = y. A characteristic property of a normed space that is also an inner product space is that (8.20) holds for any x, y E V (See Exercise 5). Equation (8.20) is called the Parallelogram rule. Definition 8.3.9. Let V be a normed space. If, for each sequence of vectors Xl, X2, ... such that lim Ilxm m,n-+oo - xnll there exists a vector X lim Ilxn - xii = m-too = 0, such that 0, we say that V is complete. A class of vectors space having great importance in quantum physics and in quantum computation is introduced in the following definition. Definition 8.3.10. An inner product space V is a Hilbert space if V is complete with respect to the norm induced by the inner product. Example 8.3.9. Both en and L 2 (<<::) are Hilbert spaces, but the subspace W of L 2 (<<::) in example (8.3.5) is not (cf. Exercise 6). Definition 8.3.11. A vector space V is a direct sum of subspaces WI and = WI EB W 2 , if V is a direct sum of the subgroups WI and W2• W 2 , denoted by V The next lemma shows that Hilbert spaces are structurally well-behaving: Lemma 8.3.3. If a subspace W of a Hilbert space V is itself W a Hilbert space, then V = WEB W.L (see Example (8. 3. 8}}. Definition 8.3.12. Let V and W be complex vector spaces. A mapping f : V -t W is a vector space morphism if f(clx + C2Y) = cd(x) + c2!(y) for each x, y E V and CI, C2 E linear mappings or operators. e. Vector space morphisms are also called 8.4 Number Theory 167 8.4 Number Theory 8.4.1 Euclid's Algorithm It should be emphasized that the proof of Lemma 8.1.5 already supplies a recursive method for computing the greatest common divisor of two given natural numbers: gcd(x - y, y), if x> y gcd(x, y) = { gcd(x, y - x), ~f x < y x, If x = y. It is clear that this method always gives gcd(x, y) correctly. However, the method is not quite efficient: for instance, in computing gcd(x, 1) it proceeds as gcd(x, 1) = gcd(x - 1,1) = gcd(x - 2, 1) = ... = gcd(l, 1) = 1, so it takes x recursive calls to perform this computation. Keeping in mind that the decimal representation of x has length £( x) = lloglO X J, the computational time is proportional to lOl(x) , i.e., the current algorithm is exponential in £(x). A more efficient method is given by Euclid's algorithm, whose core is given in the following lemma. Lemma 8.4.1. Given natuml numbeTs x > y. there are unique integeTs d and T such that x = dy + T wheTe 0 :S T < y. NumbeTs y and d can be found by using £(x)2 elementary arithmetic opemtions (multiplication, addition, and subtmction of two digits). Euclid's algorithm is based on consecutive application of the previous lemma: given x > y, we can find representations x=dIY+TI, Y Tl = d2Tl + T2, = d3T2 + T3, Tn-2 = dnTn-1 Tn-I + Tn, = dn+ITn . (8.21 ) (8.22) (8.23) (8.24) The above procedure certainly terminates with some Tn+! = 0, since Tl, T2, ... is a strictly descending sequence of non-negative numbers. We claim that gcd(x, y) equals to Tn, the last nonzero remainder in this procedure. First, by (8.24), Tn divides Tn-I. By (8.23) Tn also divides Tn-2. Continuing this way, we see that Tn divides all the numbers Ti and also x and y. Therefore, Tn is a common divisor of x and y. Assume, then, that there is another number s that divides both x and y. By (8.21), s divides TI' By (8.22), s divides 168 8. Appendix B: Mathematical Background r2 and again continuing this reasoning, we see that s eventually divides rn. Therefore rn is the greatest common divisor of x and y. How rapid is this method? Since each ri < x, we know that each iteration step can be done in a time proportional to .e(x)2. Thus, it suffices to analyze how many times we need to iterate in order to reach r n+1 = O. Because one single iteration step gives an equation where 0::; ri < ri-l, we have ri = ri-2 -diri-l < ri-2 -diri. The estimation ri-2 > (1 + di)ri 2:: 2ri follows easily. Therefore, x> y > rl > 2r3 > 4r5 ... > 2ir2i+1' n-l n-l If n is odd, we have x > 2-rrn 2:: 2-r. For an even n, we also have n-2 n-2 n-2 X > 2-rrn _l > 2-rr n , so the inequality x > 2-2- holds in any case. It follows that .e(x) = lloglO xJ > 10glO X - 1 > n2'2loglO 2 - 1. Therefore, n<2I2 (.e(x) + 1) + 2 = O(.e(n)). As a conclusion, we have that, for numoglO bers x > y, gcd(x, y) can be computed using O(.e(x)3) elementary arithmetic operations. 8.4.2 Continued Fractions Example 8.4.1. Let us apply Euclid's algorithm on pair (263,189): 263 = 1·189 + 74, 189 = 2·74 + 41, 74=1·41+33, 41 = 1· 33 + 8, 33 = 4·8 + 1, and the algorithm terminates at the next round giving gcd(263, 189) divisions we obtain 263 74 189 = 1 + 189' 189 _ 2 41 74 - + 74' 74 33 41 = 1 + 41' 41 8 33 = 1 + 33' 33 1 "8 = 4+ 8' = 1. By a set of equations such that the fraction occurring on the right-hand side is the inverse of the fraction on the left-hand side of the next equation. We can thus combine these equations to get a representation 8.4 Number Theory 263 - 189 = 1+ 1 2+ 1 169 (8.25) __--:;-__ 1 1+ 1 1+--1 4+ 8 Expression (8.25) is an example of a finite continued fraction. It is clear that this procedure can be done for any positive rational number 0: = ~ 2: 1 in the lowest terms. For other values of 0:, there is a unique way to express 0: = ao + {3 where ao E Z and (3 E [0,1). If {3 f. 0, this can also be written as 0: 1 = ao + -, 0:1 where 0:1 1 = -{3 > l. By applying the same procedure recursively to 0: 0:1, we get an expansion 1 = ao + - - - - - . 1 - - al (8.26) + -----:;1,----- where ao E Z, and aI, a2, ... are natural numbers. If 0: is an irrational number, then the sequence ao, aI, a2, ... never stops (otherwise 0: would be a rational number). Mainly for typographical reasons, we write (8.26) as ao E Z, ai E N, for i 2: 1, (8.27) and say that (8.27) is the continued fraction expansion of 0:. It is clear by its very construction that each irrational 0: has unique infinite continued fraction expansion (8.27). On the other hand, all rational numbers r have a finite continued fraction expansion ao E Z, ai E N for 1 ~ n (8.28) that can be found by using Euclid's algorithm. But the expansion (8.28) is never unique, as is shown by the following lemma. Lemma 8.4.2. If r has expansion (8.28) of odd length, the r also has expansion (8.28) of even length and vice versa. Proof. This follows straightforwardly from the identity lao, al,···, an-I, an] = lao, al,···, an-l 1 + -]. an If an 2: 2, then lao, al,"" an] = lao, al ... , an - 1,1]. If an lao, al,···, an-I, 1] = lao, al," ., an-2, an- l + 1]. 1, then o 170 8. Appendix B: Mathematical Background Example 8.4.2. The expansion (8.25) can be written in two ways: 263 189 = 1 + 1 1 2+ 1 1+ 1 = 1+ 2+ 1 1+--1 4+ 8 1 1 1+ 1 1+ 1 4+-1 7+ 1 which illustrates how we can either lengthen or shorten any finite continued fraction expansion by 1. It can, however, be shown that if a finite continued fraction expansion is required to end up by an > 1, then the representation is unique. Clearly, each finite continued fraction represents some rational number. But even though we now know that each irrational number has unique representation as an infinite continued fraction, we cannot tell a priori if each sequence where ao E Z and ai E N, when i 2 1 ao, at. a2, ... , (8.29) regarded as a continued fraction represents an irrational number. In other words, we do not yet know if the limit (8.30) exists for each sequence (8.29). We will soon find an answer to that question. Let (ai) be a sequence as in (8.29). The nth convergent of the sequence (ai) is defined to be fu = lao, ... ,an]. A simple calculation shows that q.. Po ao qo 1 PI aOal + 1 = ql al -=-, c!;) + 1 P2 aO(al + - = q2 al + I a2 = a2PI + Po a2qI + qo Continuing this way, we see that Pn and qn are polynomials that depend on ao, at. ... an. We are now interested in finding the polynomials Pn and qn, so we will regard ai as indeterminates, assuming that they do not take any specific integer values. After calculating some first Pn and qn, we may guess that there are general recursion formulae for polynomials Pn and qn· Lemma 8.4.3. For each n 2 2, + Pn-2, anqn-I + qn-2· Pn = anPn-1 qn = hold for each n 2 2. (8.31 ) (8.32) 8.4 Number Theory 171 Proof. By induction on n. The formulae (8.31) and (8.32) are true for n = 2, as we saw before. Assume then that they hold for numbers {2, ... ,n}. Then, Pn+1 - = [ao,···, an-b an, an+l 1= [ao,···, an-b an + -1- 1 qn+l = an+l 6 + )Pn-l + Pn-2 I (an + -a-)qn-l + qn-2 n+l (an an+1(anPn-1 + Pn-2) an+1(anqn-1 + qn-2) _ an+IPn + Pn-l , an+1 qn + qn-l = + Pn-l + qn-l o so the formulae (8.31) and (8.32) are valid for each n. If aI, a2, ... are natural numbers, then it follows by (8.32) that qn > qn-l + qn-2 and the inequality qn ~ Fn , where Fn is the nth Fibonacci number,7 can be proved by induction. In the sequel we will use notation Pn and qn also for the Pn and qn evaluated at {ao, ab ... , an}. The meaning of individual notation Pn or qn will be clear by the context. Using the recursion formulae, it is also easy to prove the following lemma: Lemma 8.4.4. For any n ~ 1, Pnqn-l - Pn-Iqn = (-It-I. From the previous lemma it also follows that the convergents E.!!. are always in their lowest terms: for if d divides both Pn and qn, then d al~o divides 1. Lemma 8.4.4 has even more interesting consequences: By multiplying (8.33) by an and using the recursion formulae (8.31) and (8.32) once more, we see that (8.34) whenever n ~ 2. Equations (8.33) and (8.34) can be rewritten as Pn Pn-l qn qn-l (_I)n-1 qnqn-l (8.35) and (-I)na n Pn Pn-2 ----= qn qnqn-2 qn-2 (8.36) Equation (8.35) implies inequality 7 Fibonacci numbers Fn are defined by Fo = H = 1 and by Fn = Fn- 1 + Fn-2. 8. Appendix B: Mathematical Background 172 P2n q2n < P2n-l , q2n-l which, together with (8.36), implies that P2n-2 q2n-2 < < P2n q2n P2n-l q2n-l < P2n-3. q2n-3 (8.37) Inequalities (8.37) show that the sequence of even convergents is strictly increasing but bounded above by the sequence of odd convergents which is strictly decreasing. It follows by (8.35), and by the fact that qn tends to infinity (recall that qn ? Fn), that both of the sequences converge to a limit a. Therefore, each sequence (8.29) represents some irrational number a. Moreover, since the limit a is always between two consecutive convergents, (8.35) implies that pn- a 1 <---<2". 1 1 - I - qnqn+l qn (8.38) qn Example 8.4.3. The continued fraction expansion of 11" begins with 11" = [3,7,15,1,292,1,1, ... J. Since q4 = 113 and q5 = 292 . q4 + q3 = 33102, ~ = ~~~ is a very good approximation of 11" with a relatively small nominator. By (8.38), 1 355 113 - 11" 1 :::; 1 113.33102 = 0.000000267342 .... In fact, 355 - 1 113 11" 1 = 0.000000266764 ... , so the estimation given by (8.38) is also quite precise. Remark 8.4.1. Inequality (8.38) tells us that, for an irrational a, the convergents fu give infinitely many rational approximations l?q with gcd(p, q) = 1 h of a so precise that (8.39) This has further number theory interest, since the rational numbers have only finitely many rational approximations (8.39) such that gcd(p, q) = 1. This is true because, if a = ~ with s > 0, then for q ? s and ~ =1= ~, ~ - ~I = Iq s 1 ps - qr I ? qs -.!.. qs ? -.!... q2 8.4 Number Theory 173 For 0 < q < s there are clearly only finitely many approximations ~ that satisfy (8.39) and gcd(p, q) = l. Thus, the irrational numbers can be characterized by the property that they have infinitely many rational approximations (8.38). The continued fraction expansion also gives finitely many rational approximations (8.39) for rational numbers. On the other hand, we will show in a moment (Theorem 8.4.3) that approximations which are good enough are convergents. It can even be shown that the convergents to a number a are the "best" rational approximations of a in the following sense. For the proof of the following theorem, consult [33J. Theorem 8.4.1. Let ~ be a convergent to a. If 0 < q ~ qn and ~ =I- ~, then whenever n ~ 2. Now we will show that approximations which are good enough are convergents. To do that, we must first derive an auxiliary result. Let a = lao, ab a2 ... J be a continued fraction expansion of a. We define an+l = [an+b a n+2, ... J and this gives us a formal expression This expression is not necessarily a continued fraction in the sense that an need not be a natural number, but anyway it gives us a representation ~ 1 (8.40) with Pnqn-l - Pn-lqn = (_l)n-l just like in the proof of Lemma 8.4.3. But also any representation looking "enough" like (8.40) implies that !2!. and Pn-l qn qn-l are convergents to a. More precisely: Theorem 8.4.2. Let P, Q, P', and Q' be integers such that Q and PQ' - QP' = ±l. If also a' ~ 1 and > Q' > 0 a'P+P' a = --:----:::a'Q + Q" then QP: ofa. = Pn-l qn-l and EQ = !2!. are convergents of a continuedfruction expansion qn 174 8. Appendix B: Mathematical Background Proof. Anyway, we have a finite continued fraction expansion for P Pn -. Q = [ao,al ... ,an] =qn (8.41) By Lemma 8.4.2, we can choose n to be even or odd, so we can assume, without loss of generality, that PQ' - QP' = (-It-I. (8.42) Also by Lemma 8.4.4, we have gcd(Pn' qn) = 1 and, similarly, gcd(P, Q) = 1. Since Q and qn have the same signs (Q is positive by assumption), it follows by (8.41) that P = Pn and Q = qn' Using this knowledge and Lemma 8.4.4, equation (8.42) can be rewritten as or even as (8.43) Since gcd(Pn,qn) = 1, it follows by (8.43) that qn divides Q' - qn-I. On the other hand, Q' - qn-I < Q' < qn since qn = Q > Q' > 0 by assumption. Also qn-I - Q' < qn-I < qn and, by combining these observations, we see that which can be valid only if Q' Hence, = qn-I. By (8.43) it also follows that P' = Pn-l. which can also be written as (8.44) just like in the proof of Lemma 8.4.3. Since a.' 2: 1, there is a continued 0 fraction expansion a.' = [an+! , an+2,"'] such that an+! 2: 1. Now we are ready to conclude this section by Theorem 8.4.3. If 0< then ~ IEq - 0'.1 <- _1_ 2q2' is a convergent to the continued fraction expansion of a.. 8.5 Shannon Entropy and Information 175 Proof. By assumption, () p --a-aq q2' for some 0 < () ~ ~ and a = ±l. Let ~ = lao, al, ... , an] be the continued fraction expansion for ~. Again by Lemma 8.4.2, we may assume that a = (_1)n-l. We define where fu = P.q and qn Thus, we have Pn-l qn-l are the last and the second-last convergents to E.. q and Thus, , - 1 qn-l 2 - 1 -- 1. a-----> () qn By Theorem 8.4.2, pn-l qn-l and fu = I!. qn q are consecutive convergents to a. D 8.5 Shannon Entropy and Information The purpose of this section is to represent, following [14], the very elementary concepts of information theory. In the beginning, we concentrate on information represented by binary digits: bits. 8.5.1 Entropy By using a single bit we can represent two different configurations: the bit is either 0 or 1. Using two bits, four different configurations are possible, three bits allow eight configurations, etc. Inspired by these initial observations, we say that the elementary binary entropy of a set with n elements is log2 n. The elementary binary entropy, thus, approximately measures the length of bit strings which we need to label all the elements of the set. 176 8. Appendix B: Mathematical Background Remark 8.5.1. Consider the problem of specifying one single element out of given n elements such that the probability of being the outcome is ~ for each element. The elementary binary entropy log2 n thus measures the initial uncertainty in bits, i.e., to specify the outcome, we have to gain log2 n bits of information. In other words, we have to obtain the binary string of log2 n bits that labels the outcome. On the other hand, using an alphabet of 3 symbols, we would say that the elementary ternary uncertainty of a set of n elements is log3 n. To get a more unified approach, we define the elementary entropy of a set of n elements to be H(n) = Klogn, where K is a constant and log is the natural logarithm. Choosing K = lo~ 2 gives the elementary binary information, K = lo~ 3 gives the ternary, etc. However, it feels well justified to say that the information needed in average is less than K log n if the elements appear as outcomes with non-uniform probability distribution. For example, if we have a priori knowledge that some element does not appear at all, we can say that the information needed, i.e., the initial uncertainty of the outcome, is at most K log( n - 1). How do we deal with a set of n elements, each appearing with a known probability of Pi?' Assume that we choose l times an element of a set having n elements with uniform probability distribution. This experiment can also be viewed as specifying a string of length lover an n-Ietter alphabet. There are n l such strings, and since the letters appear with equal probabilities, the strings do as well. Thus, the elementary entropy of the string set is H(nl) = K log n l = l· Klogn = l· H(n), which is exactly what we could expect: the elementary entropy of strings length l is l times the elementary entropy of a single letter. We use a similar idea to handle the non-uniform probability distributions. Let us choose l elements of an n-element set having probability distribution Pl, ... , Pn. Therefore, with large enough l, the ith letter in any such string should appear approximately k i = Pil times. Let us study a simplified case where each ki = Pil is an integer. A simple combinatorial calculation shows that there are exactly l! kl!···k n ! strings where the ith letter occurs ki = Pil times. Moreover, the distribution of such strings tends to uniform distribution as l tends to infinity. The elementary entropy of the set of such strings is 8.5 Shannon Entropy and Information l! K log k 1 ! ... k n ! = K(log l! 177 -log kl! - ... -log kn!). By using log k! = k log k - k + O(log k) and l = kl the average entropy (per letter) as + ... + kn, we can write H(Pl, ... ,Pn) 1 = T· K(llogl-l + O(logl) :L (ki log ki n - ki + O(logki ))) i=1 1 = K n T(:L k i log l + O(log l) n - i=1 :L ki log ki + o (log l)) i=1 n ki O( -log l) -_ - K:L -ki 1og-+ l l l i=1 ~ logl = -K ~Pi log Pi + O(-l-)· i=1 By letting l -+ definition. 00, the last term tends to o. This encourages the following Definition 8.5.1. Suppose that the elements of an n-element set occur with probabilities of PI, ... , Pn· The Shannon entropy of distribution PI, ... , Pn is defined to be H(Pl, ... ,Pn) = -K(PllogPl + .. ·Pn logPn). Remark 8.5.2. Because limp-toplogp H(pl, ... ,Pn) ;::: o. Remark 8.5.3. If K = = 0, we define 0 . logO = o. Clearly, in the above definition, we talk about binary Shannon entropy. The special case that Pi = ~ for each i yields H(n) = - K n . ~ log ~ = K log n, again giving the elementary entropy, which we could now call uniform entropy as well . lo! 2 Remark 8.5.4. Recall that the elementary binary entropy measures how many bits of information we have to receive in order to specify an element in a set of n elements with uniform probability distribution. In other words, it measures the uncertainty of outcome in bits. On the other hand, binary Shannon entropy measures the average uncertainty in bits in a set with a probability distribution PI, ... , Pn· Since log is a concave function,8 it follows easily that for any probability distribution PI, ... , Pn, inequality 8 A real function f is concave in an interval I if the graph of f is above the chord connecting any points f(XI) and f(X2), Xl, X2 E I. Formally, a concave function f must satisfy f(>"XI + (1 - >")X2) ~ >..f(XI) + (1- >..)f(X2) for>.. E [0,1] and Xl, X2 E I. 178 8. Appendix B: Mathematical Background Pllog Xl + ... + Pn log Xn ~ log(P1Xl + ... + Pnxn) holds true. Therefore, H(Pl,'" ,Pn) = Pllogpl l ~ log(P1Pl l + ... + Pn logp~l + ... + Pnp~l) = logn. Notice that the Shannon entropy H(Pl,'" ,Pn) also attains the above upper bound at Pl = ... = Pn = ~. Moreover, Shannon entropy has the following properties: 1. H (Pl, ... ,Pn) is a symmetric continuous function. 2. H ( ~, ... , ~) is a non-negative, strictly increasing function of variable n. 3. If (Pl, ... ,Pn) and (ql, ... ,qm) are probability distributions, then H(Pl,'" ,Pn) + PnH(ql,"" qm) = H(Pl,'" ,Pn-l,Pnql,'" ,Pnqm)' We will make the definition of Shannon entropy even more natural by showing that the converse also holds: Theorem 8.5.1. If H(Pl,'" ,Pn) satisfies the conditions 1-3 above, then H (Pl, ... ,Pn) = - K (Pl log Pl + ... + Pn log Pn) fOT some positive constant K. Proof. We will first prove an auxiliary result that will be of great help: if f is a strictly increasing, non-negative function defined on natural numbers such that f(sm) = mf(s), then f(s) = Klogs for some positive constant K. To prove this, we take an arbitrary natural number n and fix m such that sm ~ 2n < sm+l. Then m log 2 1 m -n -< - < -+-. logs n n (8.45) Because f is strictly increasing, we have also f(sm) < f(2 n ) < f(sm+l), which implies that < f(2) < m n - f(8) n m + ~. n (8.46) Inequalities (8.45) and (8.46) together imply that I f(2) f(s) _lOg21 <~. logs - n Since n can be chosen arbitrarily great, we must have f (s) = fo~2~ log s = K log s. Constant K = fo~2~ is positive, since f is strictly increasing and f(l) = f(1°) = Of (1) = o. For any natural number n, we define f (n) = H (~, ... , ~). We will demonstrate that f(sm) = mf(s). Symmetry of H and condition 3 imply that 8.5 Shannon Entropy and Information 1 1 11 11 179 1 H(-, ... , ) = H(-, ... ,-) +s· -H(-l' -l)' sm sm s S S sm- ... ' smwhich is to say that f(sm) = f(s) + f(sm-l). Claim f(sm) = mf(s) follows now by induction. Since f is strictly increasing and non-negative by 2, it follows that f(s) = Klogs for some positive constant K. Let Pi, ... , Pn be some rational numbers such that Pl + ... + Pn = 1. We can assume that Pi = 71, where N = ml + ... + m n . Using 3 we get 1 1 = H(N'···' N) 111 1 = H(Pl· -,···,Pl· -,P2· -,.··,···,···,Pn·-) , ml v ml m2 ' ------......-- mn ~ Hence, n = -K(LPilogmi -logN) i=l n n = -K(LPilogmi - LPilog N ) i=l n i=l = -KLPilogpi, i=l which demonstrates that the theorem is true for rational probabilities. Since H is continuous, the result extends to real numbers straightforwardly. 0 8.5.2 Information We will hereafter ignore the constant K in the definition of entropy. Let X = {Xl, ... ,xn} be a set of n elements, and P(Xl)' ... , p(xn) the corresponding probabilities of elements to occur. Similarly, let Y be a set with m elements with a probability distribution p(yt}, ... , P(Ym). Such sets are here identified with discrete random variables: we regard X as a variable which has n potential values Xl, ... , Xn, any such with probability P(Xi). The joint entropy of sets (random variables) X and Y is simply defined as the entropy of set X x Y, where the corresponding probability distribution is the joint distribution P(Xi, Yj): 180 8. Appendix B: Mathematical Background n m H(X, Y) = - L LP(Xi, Yj) logp(xi' Yj)· i=l j=l The conditional entropy of X, when the value of Y is known to be Yj, is defined by merely replacing the distribution P(Xi) with conditional probability distribution p(xiIYj): n H(XIYj) = - LP(XiIYj) logp(xiIYj). i=l The conditional entropy of X when Y is known is defined as the expected value m H(XIY) = LP(Yj)H(XIYj). j=l By using the concavity of the function log, one can quite easily prove the following lemma. Lemma 8.5.1. H(XIY) :::; H(X). Remark 8.5.5. The above lemma is quite natural: it states that the knowledge of Y cannot increase the uncertainty about X. Conditional and joint entropy are connected in the following lemma, which has a straightforward proof if one keeps in mind that p(Xi' Yj) = p(xilYj )p(Yj)· Lemma 8.5.2. H(XIY) = H(X, Y) - H(Y). Finally, we have all the necessary tools for defining the information of X when Y is known. Definition 8.5.2. I(XIY) = H(X) - H(XIY). Definition (8.5.2) contains the natural idea that the knowledge that Y can provide about X is merely the uncertainty of X minus the uncertainty of X provided Y is known. According to Lemma (8.5.1), I(XIY) 2: 0 always, and trivially I(XIY) :::; HeX). To end this section, we list some properties of entropy and information. The proofs of the following lemmata are left as exercises. Lemma 8.5.3. H(X, Y) :::; HeX) + HeY). Lemma 8.5.4. Information is symmetric: IeXIY) = I(YIX). The above lemma justifies the terminology mutual information of X and Y. For the following lemma, we need some extra terminology: if X and Y are random variables, we say that X (randomly) depends on Y if there are conditional probabilities p(xiIYj) such that m p(Xi) = LP(XiIYj)p(Yj). j=l 8.6 Exercises 181 Lemma 8.5.5. Let X be a random variable that depends on Y, which is a random variable depending on Z. If X is conditionally independent of Z, then J(ZIX) :S J(ZIY). 8.6 Exercises 1. Prove that a set S <;; V is linearly dependent if and only if for some element XES, x E L(S\ {x}). 2. Show that if Band B' are two bases of a H n , then necessarily IBI = IB'I. 3. a) Let n ~ 2 and Xl, X2 E Hn. Show that y E Hn can be chosen in such a way that L(XI' y) = L(XI' X2) and (Xl I y) = O. b) Generalize a) into a procedure for finding an orthonormal basis of Hn (the procedure is called the Gram-Schmidt method). 4. Prove that function f : N ~ C defined by f(n) = ~ is in L 2 (C) but cannot be expressed as a linear combination of E = {e I, e2, e2, ... }. 5. Prove the parallelogram rule: in inner product space V, equation holds for any x, y E V. 6. Show that the subspace W of L 2 (C) generated by vectors {el' e2, e3, .. . } is not a Hilbert space (cf. Example 8.3.4). 7. Prove Lemma 8.4.1. 8. Prove Lemma 8.4.4. 9. Prove Lemmata 8.5.3 and 8.5.4. 10. Use the properties of the function log to prove Lemma 8.5.5. References 1. Andris Ambainis: A note on quantum black-box complexity of almost all Boolean functions, Information Processing Letters 71, 5-7 (1999). Electronically available at quant-ph/9811080. 9 2. E. Bach and J. Shallit: Computational number theory, MIT Press (1996). 3. Adriano Barenco, Charles H. Bennett, Richard Cleve, David P. DiVincenzo, Norman Margolus, Peter Shor, Tycho Sleator, John Smolin, and Harald Weinfurter: Elementary gates for quantum computation, Physical Review A 52:5, 3457-3467 (1995). Electronically available at quant-ph/9503016. 4. R. Beals, H. Buhrman, R. Cleve, M. Mosca, and R. de Wolf: Quantum lower bounds by polynomials, Proceedings of the 39th Annual IEEE Symposium on Foundations of Computer Science - FOCS, 352-361 (1998). Electronically available at quant-ph/9802049. 5. P. A. Benioff: Quantum mechanical Hamiltonian models of discrete processes that erase their own histories: application to Turing machines, International Journal of Theoretical Physics 21:3/4,177-202, (1982). 6. Charles H. Bennett: Logical reversibility of computation, IBM Journal of Research and Development 17, 525-532 (1973). 7. Charles H. Bennett: Time/space trade-offs for reversible computation, SIAM Journal of Computing 18, 766-776 (1989). 8. Charles H. Bennett, Ethan Bernstein, Gilles Brassard, and Umesh V. Vazirani: Strengths and weaknesses of quantum computation, SIAM Journal of Computing 26:5, 1510-1523 (1997). Electronically available at quant-ph/9701001. 9. Ethan Bernstein and Umesh Vazirani: Quantum complexity theory, SIAM Journal of Computing 26:5, 1411-1473 (1997). 10. Andre Berthiaume and Gilles Brassard: Oracle quantum computing, Proceedings of the Workshop on Physics and Computation - PhysComp'92, IEEE Press, 195-199 (1992). 11. Michel Boyer, Gilles Brassard, Peter Hoyer, and Alain Tapp: Tight bound on quantum searching, Fourth Workshop on Physics and Computation PhysComp'96, Ed.: T. Toffoli, M. Biaford, J. Lean, New England Complex Systems Institute, 36-43 (1996). Electronically available at quant-ph/9605034. 12. Gilles Brassard and Peter Hoyer: An exact quantum polynomial-time algorithm for Simon's problem, Proceedings of the 1997 Israeli Symposium on Theory of Computing and Systems - ISTCS'97, 12-23 (1997). Electronically available at quant-ph/9704027. 13. Gilles Brassard, Peter Hoyer, and Alain Tapp: Quantum counting, Automata, Languages and Programming, Proceedings of the 25th International Colloquium, ICALP'98, Lecture Notes in Computer Science 1443, 820-831, Springer (1998). Electronically available at quant-ph/9805082. 9 Code "quant-ph/9811080" refers to http://xxx.lanl.gov/absfquant-ph/9811080 at Los Alamos preprint archive. 184 References 14. Leon Brillouin: Science and information theory, 2nd edition, Academic Press (1967). 15. Paul Busch, Pekka J. Lahti, and P. Mittelstaedt: The quantum theory of measurement, Springer-Verlag, 1991. 16. A. R. Calderbank, Peter W. Shor: Good quantum error-correcting codes exist, Physical Review A 54, 1098-1105 (1996). Electronically available at quantph/9512032. 17. Man-Duen Choi: Completely positive linear maps on complex matrices, Linear Algebra and its Applications 10, 285-290 (1975). 18. Isaac L. Chuang, Lieven M. K. Vandersypen, Xinlan Zhou, Debbie W. Leung, and Seth Lloyd: Experimental realization of a quantum algorithm, Nature 393, 143-146 (1998). Electronically available at quant-ph/9801037. 19. Juan I. Cirac and Peter Zoller: Quantum computations with cold trapped ions, Physical Review Letters 74:20, 4091-4094 (1995). 20. Henri Cohen: A course in computational algebraic number theory, Graduate Texts in Mathematics 138, Springer (1993), 4th printing 2000. 21. Win van Dam: Two classical queries versus one quantum query, electronically available at quant-ph/9806090. 22. David Deutsch: Uncertainty in quantum measurements, Physical Review Letters 50:9, 631-633 (1983). 23. David Deutsch: Quantum theory, the Church- Turing principle and the universal quantum computer, Proceedings of the Royal Society of London A 400, 97-117 (1985). 24. David Deutsch: Quantum computational networks, Proceedings of the Royal Society of London A 425, 73-90 (1989). 25. David Deutsch, Adriano Barenco, and Artur Ekert: Universality in quantum computation, Proceedings of the Royal Society of London A 449, 669-677 (1995). Electronically available at quant-ph/9505018. 26. David Deutsch, Richard Jozsa: Rapid solutions of problems by quantum computation, Proceedings of the Royal Society of London A 439, 553-558 (1992). 27. Chistoph Durr and Peter Hoyer: A quantum algorithm for finding the minimum, electronically available at quant-ph/9607014. 28. Mark Ettinger and Peter Hoyer: On quantum algorithms for noncommutative hidden subgroups, Proceedings of the 16th Annual Symposium on Theoretical Aspects of Computer Science - STACS 99, Lecture Notes in Computer Science 1563, 478-487, Springer (1999). Electronically available at quant-ph/9807029. 29. Edward Farhi, Jeffrey Goldstone, Sam Gutmann, and Michael Sipser: A limit on the speed of quantum computation in determining parity, Physical Review Letters 81:5, 5442-5444 (1998). Electronically available at quant-ph/9802045. 30. Richard P. Feynman: Simulating physics with computers, International Journal of Theoretical Physics 21:6/7, 467-488 (1982). 31. Lov K. Grover: A fast quantum-mechanical algorithm for database search, Proceedings of the 28th Annual ACM Symposium on the Theory of Computing STOC, 212-219 (1996). Electronically available at quant-ph/9605043. 32. Josef Gruska: Quantum Computing, McGraw-Hill (1999). 33. G. H. Hardy and E. M. Wright: An introduction to the theory of numbers, 4th ed. with corrections, Clarendon Press, Oxford (1971). 34. Mika Hirvensalo: On quantum computation, Ph.Lic. Thesis, University of Turku, 1997. 35. Mika Hirvensalo: The reversibility in quantum computation theory, Proceedings of the 3rd International Conference Developments in Language Theory DLT'97, Ed.: Symeon Bozapalidis, Aristotle University of Thessaloniki, 203-210 (1997). References 185 36. Loo Keng Hua: Introduction to number theory, Springer-Verlag, 1982. 37. E. Knill, R. Laflamme, R. Martinez, C.-H. Tseng: An algorithmic benchmark for quantum information processing, Nature 404: 368-370 (2000). 38. Rolf Landauer: Irreversibility and heat generation in the computing process, IBM Journal of Research and Development 5, 183-191 (1961). 39. M. Y. Lecerf: Recursive insolubilite de l'equation generale de diagonalisation de deux monomorphismes de monoi·des libres 'fiX = "px, Comptes Rendus de l'Academie des Sciences 257,2940-2943 (1963). 40. Ming Li, John Tromp and Paul Vitanyi: Reversible simulation of irreversible computation, Physica D 120:1/2, 168-176 (1998). Electronically available at quant-ph/9703009. 41. Seth Lloyd: A potentially realizable quantum computer, Science 261,1569-1571 (1993). 42. Hans Maassen and J. B. M. Uffink: Generalized entropic uncertainty relations, Physical Review Letters 60:12, 1103-1106 (1988). 43. F. J. MacWilliams and Neil J. A. Sloane: The theory of error-correcting codes, North-Holland (1981). 44. Gary L. Miller: Riemann's hypothesis and tests for primality, Journal of Computer and System Sciences 13, 300-317 (1976). 45. Michele Mosca and Artur Ekert: The hidden subgroup problem and eigenvalue estimation on a quantum computer, Quantum Computing and Quantum Communications, Proceedings of the 1st NASA International Conference, Lecture Notes in Computer Science 1509,174-188, Springer (1998). Electronically available at quant-ph/9903071. 46. A. J. Menezes, P. C. van Oorschot, and S. A. Vanstone: Handbook of applied cryptography, Series on Discrete and Mathematics and Its Applications, CRC Press (1997). 47. Masanao Ozawa: Quantum Turing machines: local transitions, preparation, measurement, and halting problem, electronically available at quant-ph/9809038 (1998). 48. Christos H. Papadimitriou: Computational complexity, Addison-Wesley (1994). 49. K. R. Parthasarathy: An introduction to quantum stochastic calculus, Birkhauser, Basel (1992). 50. R. Paturi: On the degreee of polynomials that approximate symmetric Boolean functions, Proceedings of the 28th Annual ACM Symposium on the Theory of Computing - STOC, 468-474 (1992). 51. Max Planck: Annalen der Physik 1, 69, 1900; Verhandlg. dtsch. phys. Ges., 2, 202; Verhandlg. dtsch. phys. Ges. 2, 237; Annalen der Physik 4, 553, 1901. 52. M. B. Plenio and P. L. Knight: Realistic lower bounds for the factorization time of large numbers on a quantum computer, Physical Review A 53:5, 2986-2990 (1996). Electronically available at quant-ph/9512001. 53. E. L. Post: The two-valued iterative systems of mathematical logic, Princeton University Press (1941). 54. E. L. Post: A variant of a recursively unsolvable problem, Bulletin of the American Mathematical Society 52, 264-268 (1946). 55. John Preskill: Robust solutions to hard problems, Nature 391, 631-632 (1998). 56. Marcel Riesz: Sur les maxima des formes bilineaires et sur les fonctionnelles lineaires, Acta Mathematica 49, 465-497 (1926). 57. Yurii Roghozin On the notion of universality and small universal Turing machines, Theoretical Computer Science 168, 215-240 (1996). 58. Sheldon M. Ross: Introduction to probability models, 4th edition, Academic Press (1985). 186 References 59. J. Barkley Rosser and Lowell Schoenfeld, Approximate formulas for some functions of prime numbers, Illinois Journal of Mathematics 6:1, 64-94 (1962). 60. Walter Rudin: Functional Analysis, 2nd edition, McGraw-Hill (1991). 61. Keijo Ruohonen: Reversible machines and Post's correspondence problem for biprefix morphisms, ElK - Journal of Information Processing and Cybernetics 21:12, 579-595 (1985). 62. Arto Salomaa: Public-key cryptography, Texts in Theoretical Computer Science - An EATCS Series, 2nd ed., Springer (1996). 63. Peter W. Shor: Algorithms for quantum computation: discrete log and factoring, Proceedings of the 35th Annual IEEE Symposium on Foundations of Computer Science - FOCS, 20-22 (1994). 64. Peter W. Shor: Scheme for reducing decoherence in quantum computer memory, Physical Review A 52:4, 2493-2496 (1995). 65. Daniel R. Simon: On the power of quantum computation, Proceedings of the 35th Annual IEEE Symposium on Foundations of Computer Science - FOCS, 116-123 (1994). 66. Douglas R. Stinson: Cryptography - Theory and practice, Series on Discrete Mathematics and Its Applications, CRC Press, Boca Raton (1995). 67. W. Tittel, J. Brendel, H. Zbinden, N. Gisin: Violation of Bell inequalities by photons more than 10 km apart, Physical Review Letters 81:17, 3563-3566, (1998). Electronically available at quant-ph/9806043. 68. Tommaso Toffoli: Bicontinuous extensions of invertible combinatorial functions, Mathematical Systems Theory 14, 13-23 (1981). 69. B. L. van der Waerden: Sources of quantum mechanics, North-Holland (1967). 70. C. P. Williams and S. H. Clearwater: Explorations in quantum computing, Springer (1998). 71. C. P. Williams and S. H. Clearwater: Ultimate zero and one. Computing at the quantum frontier, Springer (2000). 72. William K. Wootters, Wojciech H. Zurek: A single quantum cannot be cloned, Nature 299, 802-803 (1982). 73. Andrew Chi-Chih Yao: Quantum circuit complexity, Proceedings of the 34th Annual IEEE Symposium on Foundations of Computer Science - FOCS, 352361 (1993). Index BPP BQP EQP 18-20,31 31 31 NP 18,31 NQP 31 P 16,20 R 15 RE 15 RP 18-20,31 ZPP 18-20,31 accepting computation adjoint matrix 8 adjoint operator 108 algorithm 15 alphabet 13 amplitude 7, 106,115 14 basis 163 basis state 8,115 basis states VI Benioff, Paul 1 Bennett, Charles 32 Bernstein, Ethan 1, 33 Bezout's identity 150 bijective morphism 152 binary alphabet 34 binary quantum gate 24 black body 103 blackbox function 74-76 Bohr, Niels 104 Bolzmann's constant 32 Boolean circuit 15, 34 Born, Max 105 bra-vector 126 Cauchy-Schwartz inequality 165 causality principle 116,135 character 154 character group 154 Chinese Remainder Theorem 150 Church-Turing thesis 15 circuit 13 commutator 121 complete 107,166 completely positive 135, 136 compound systems 115 computation 14 computational basis 22 computational step 14 concatenation 13 concave function 177 conditional entropy 180 configuration 13,21 configuration space 17 congruence 147 constructive interference 106 continued fraction 169 controlled not 25 convergent 170 convex combination 127 coset 146 coset product 149 cyclic group 154 de Broglie, Luis 105 decidability 15 decision problem 15 decomposable 24, 115 decomposable state 11 degenerate 110 density matrix 126 density operator 127 destructive interference 106 deterministic Turing machine 13 Deutsch, David 1,2,33 dimension 163 Dirac, Paul 105 direct product 153, 155 direct sum 155,166 discrete Fourier transform 158 discrete logarithm 69 distance 166 distance function 166 188 Index dual group dual space 154 126 eigenspace 110 eigenvector 108 Einstein, Albert 104 elementary row operations 70 entangled 24, 115 entangled state 11 entropic uncertainty relation 123 entropy 175 epimorphism 152 EPR pair 25 equivalent states 115 Euclid's algorithm 167 Euler's cp-function 150 Euler's constant 59 Euler's theorem 151 expected value 120 factor group 148 fast Fourier transform 49 Fermat's little theorem 151 Feynman, Richard 1,27,33 Fibonacci numbers 171 Fourier transform 41,42,49,154, 158-161 Fourier transform decomposition 43 Fn§Chet, Maurice 126 functional 126 Gauss-Jordan elimination 71 general linear group 146 generator matrix 64 Gram-Schmidt method 181 group axioms 145 group morphisms 152 group theory 145 Grover's search algorithm 73,87 Hadamard matrix 44 Hadamard transform 44,159 Hadamard-Walsh matrix 23, 113 Hadamard-Walsh transform 44, 159 halting computation 14 Hamilton operator 113,118 Hamming weight 95 Heisenberg picture 135 Heisenberg's uncertainty principle 122 Heisenberg, Werner 105 Hertz 104 Hilbert space 7,11,22,24,27,166 image 152 index 147 Information 175 information VI injective morphism 152 inner product 156,164 inner product space 164 internal control 13 inverse element 145 inverse Fourier transform 160 inversion 145 inversion about average 82 Jordan, Paul 105 kernel 152 ket-vector 126 Kronecker product 25 Lagrange's theorem 147 Landauer, Ralf 32 Las Vegas algorithm 19 Lecerf, Yves 32 letter 13 linear combination 162 linear dependence 162 linear mapping 166 literal 73 MacWilliams, F. J. 2 Markov chain 6 Markov matrix 6 Markov, Andrei 6 Maxwell, James Clerk 104 mixed state 4, 127 monomorphism 152 Monte Carlo algorithm 19 multitape Turing machine 20 natural basis 163 natural numbers 145 neutral element 145 No-cloning Theorem 27 nondeterministic Turing machine norm 108, 156, 165 normal subgroup 147 normed space 165 observable 3, 118 observation VI, 22, 24 operator 108, 166 opposite element 145 order 149 orthogonal complement 108,165 18 Index orthogonality 164 orthogonality relations Ruohonen, Keijo 32 158 parallelogram rule 166 Parseval's identity 42,159 Pauli, Wolfgang 105 period 161 permutation matrix 38 phase shift matrix 114 phase space 3 photoelectric effect 104 photon 104 pivot 71 Planck's constant 103 Planck, Max 103 polarization equation 117 positive operator 109 positive operator-valued measure 120 Post, Emil 32,34 Preskill, John 2 principal character 154 probabilistic Turing machine 16 probability distribution 122 product character 154 projection 109, 153 projection-valued measure 119 pure state 4, 127 quantum bit 8,22 quantum computation VI quantum computer VI quantum Fourier transform 41,42 quantum information VI quantum physics V quantum register 24,26,116 quantum time evolution 116 Quantum Turing machines 30 qubit 22,116 query operator, modified 80 quotient group 148 random variable 179 range 152 Rayleigh, John 103 recursive solvability 15 reduced row-echelon form 71 rejecting computation 14 relativity theory V reversible circuits 36 reversible gate 36 Riesz, Frigyes 126 Riesz, Marcel 123 rotation matrix 114 row-echelon form 71 scalar multiplication 161 SchrOdinger equation 118, 128 Schrodinger picture 135 search problems 73 self-adjoint operator 105, 109 Shannon entropy 122,175,177 Shannon, Claude 2 Shor, Peter 2,33 Simon's promise 63 span 162 spectral representation 110,113 spectrum 110 state VI, 3, 7, 22, 24, 115, 127 state space 7 state vector 3 state vectors 105 subgroup 146 subspace 163 successor 14 superposition VI, 8, 31, 115 surjective morphism 152 tensor product 10,25 time complexity 16 time evolution 135 Toffoli, Tommaso 38 trace 108 tracing over 131 tractability 16 transition function 13 transversal 65 trivial character 154 trivial subgroup 147 Turing machine V, 13, 14 uncertainty principle 121 undecidability 15 unit element 145 unitary matrix 8 unitary operator 109 universal Turing machine 33 variance 121 Vazirani, Umesh 1,33 vector space 161 vector space axioms 161 vector states 127 von Neumann-Liiders measurement 29 Walsh transform 44,159 189 190 Index wave function 8 Wien, Wilhelm 103 Wootters, W. K. 27 word 13 Yao, Andrew Young 104 39 zero-knowledge protocol Zurek, W. H. 27 73 Natural Computing Series W.M. Spears: Evolutionary Algorithms. The Role of Mutation and Recombination. XIV, 222 pages, 55 figs., 23 tables. 2000 H.-G. Beyer: The Theory of Evolution Strategies. XIX, 380 pages, 52 figs., 9 tables. 2001 L. Kallel, B. Naudts, A. Rogers (Eds.): Theoretical Aspects of Evolutionary Computing. X, 497 pages. 2001 M. Hirvensalo: Quantum Computing. XI, 190 pages. 2001 M. Amos: Theoretical and Experimental DNA Computation. Approx. 200 pages. 2001 L.P. Landweber, E. Winfree (Eds.): Evolution as Computation. DIMACS Workshop, Princeton, January 1999. Approx. 300 pages. 2001