Uploaded by Luis Mantilla

hirvensalo2001

advertisement
Natural Computing Series
Series Editors: G. Rozenberg (Managing)
Th. BackA.E. Eiben J.N. Kok H.P. Spaink
Leiden Center for Natural Computing
.e.
1
Advisory Board: S. Amari G. Brassard M. Conrad
K.A. De Jong C.C.A.M. Gielen T. Head 1. Kari
1. Landweber T. Martinetz Z. Michalewicz M.C. Mozer
E.Oja J. Reif H. Rubin A. Salomaa M. Schoenauer
H.-P. Schwefel D. Whitley E. Winfree J.M. Zurada
Springer-Verlag Berlin Heidelberg GmbH
Mika Hirvensalo
Quantum
Computing
With 5 Figures
Springer
Author
Mika Hirvensalo
Department of Mathematics
University of Turku
20014 Turku, Finland
mikhirve@cs.utu.fi
Series Editors
G. Rozenberg (Managing Editor)
Th. Băck, A.E. Eiben, J.N. Kok, H.P. Spaink
Leiden Center for Natural Computing
Leiden University
Niels Bohrweg 1
2333 CA Leiden, The Netherlands
rozenber@cs.leidenuniv.nl
Library of Congress Cataloging-in-Publication Data
Quantum computing
p.cm.
ISBN 978-3-662-04463-6
ISBN 978-3-662-04461-2 (eBook)
DOI 10.1007/978-3-662-04461-2
1. Quantum computers.
QA76.889.Q82 2001
2001020738
004.1--dc21
ACM Computing Classification ( 1998 ): F.l-2, G .1.2, G .3, H.l.l, 1.1.2, J.2
ISBN 978-3-662-04463-6
This work is subject to copyright. All rights are reserved, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data
banks. Duplication of th1s publication or parts thereof is permitted only under the
provisions of the German Copyright Law of September 9, 1965, in its current version, and
permission for use must always be obtained from Springer-Verlag Berlin Heidelberg GmbH.
Violations are liable for prosecution under the German Copyright Law.
http://www.springer.de
© Springer-Verlag Berlin Heidelberg 2001
Originally published by Springer-Verlag Berlin Heidelberg New York in 2001
Softcover reprint ofthe hardcover lst edition 2001
The use of general descriptive names, trademarks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
Cover Design: KiinkelLopka, Heidelberg
Typesetting: Camera ready by the author
Printed on acid-free paper
SPIN 10723406
45/3142PS- 5 4 3 2 1 O
Preface
The twentieth century witnessed the birth of revolutionary ideas in the physical sciences. These ideas began to shake the traditional view of the universe
dating back to the days of Newton, even to the days of Galileo. Albert Einstein is usually identified as the creator of the relativity theory, a theory that
is used to model the behavior of the huge macrosystems of astronomy. Another new view of the physical world was supplied by quantum physics, which
turned out to be successful in describing phenomena in the microworld, the
behavior of particles of atomic size.
Even though the first ideas of automatic information processing are quite
old, I feel justified in saying that the twentieth century also witnessed the
birth of computer science. As a mathematician, by the term "computer science", I mean the more theoretical parts of this vast research area, such as
the theory of formal languages, automata theory, complexity theory, and algorithm design. I hope that readers who are used to a more flexible concept of
"computer science" will forgive me. The idea of a computational device was
crystallized into a mathematical form as a Turing machine by Alan Turing
in the 1930s. Since then, the growth of computer science has been immense,
but many problems in newer areas such as complexity theory are still waiting
for a solution.
Since the very first electronic computers were built, computer technology
has grown rapidly. An observation by Gordon Moore in 1965 laid the foundations for what became known as "Moore's Law" - that computer processing
power doubles every eighteen months. How far can this technical process go?
How efficient can we make computers? In light of the present knowledge, it
seems unfair even to attempt to give an answer to these questions, but some
estimate can be given. By naively extrapolating Moore's law to the future,
we learn that sooner or later, each bit of information should be encoded by
a physical system of subatomic size! Several decades ago such an idea would
have seemed somewhat absurd, but it does not seem so anymore. In fact, a
system of seven bits encoded subatomically has been already implemented
[37]. These small systems can no longer be described by classical physics, but
rather quantum physical effects must be taken into consideration.
When thinking again about the formalization of a computer as a Turing
machine, rewriting system, or some other classical model of computation, one
VI
Preface
realizes that the concept of information is usually based on strings over a finite
alphabet. This strongly reflects the idea of classical physics in the following
sense: each member of a string can be represented by a physical system
(storing the members in the memory of an electronic computer, writing them
on sand, etc.) that can be in a certain state; Le., contain a character of the
alphabet. Moreover, we should be able to identify different states reliably.
That is, we should be able to make an observation in such a way that we
become convinced that the system under observation represents a certain
character.
In this book, we typically identify the alphabet and the distinguishable
states of a physical system that represent the information. These identifiable states are called basis states. In quantum physical microsystems, there
are also basis states that can be identified and, therefore, we could use such
microsystems to represent information. But, unlike the systems of classical
physics, these microsystems are also able to exist in a superposition of basis
states, which, informally speaking, means that the state of such a system can
also be a combination of basis states. We will call the information represented
by such microsystems quantum information. One may argue that in classical physics it is also possible to speak about combinations of basis states:
we can prepare a mixed state which is essentially a probability distribution
of the basis states. But there is a difference between the superpositions of
quantum physics and the probability distributions of classical physics: due to
the interference effects, the superpositions cannot be interpreted as mixtures
(probability distributions) of the basis states.
Richard Feynman [30J pointed out in 1982 that it appears to be extremely
difficult by using an ordinary computer to simulate efficiently how a quantum physical system evolves in time. He also demonstrated that, if we had a
computer that runs according to the laws of quantum physics, then this simulation could be made efficiently. Thus, he actually suggested that a quantum
computer could be essentially more efficient than any traditional one.
Therefore, it is an interesting challenge to study quantum computation,
the theory of computation in which traditional information is replaced by
its quantum physical counterpart. Are quantum computers more powerful
than traditional ones? If so, what are the problems that can be solved more
efficiently by using a quantum computer? These questions are still waiting
for answers.
The purpose of this book is to provide a good introduction to quantum
computation for beginners, as well as a clear presentation of the most important presently known results for more advanced readers. The latter purpose
also includes providing a bridge (from a mathematician's point of view) between quantum mechanics and the theory of computation: it is not only my
personal experience that the language used in research articles on these topics
is completely different.
Preface
VII
Chapter 1, especially Section 1.4, includes the most basic knowledge for
the presentation of quantum systems relevant to quantum computation.
Chapter 2 is divided as follows: Turing machines, as well as their probabilistic counterparts, are introduced in Section 2.1 as traditional models of
computation. For a reader interested in the research on quantum computation but having little knowledge of the theory of computation, this section
is designed to also include the basic definitions of complexity theory. In Section 2.2, the knowledge on quantum systems is deepened and the quantum
counterparts of the computational models are presented. Sections 2.1 and 2.2
are intended for a reader who has a solid background in quantum mechanics,
but little previous knowledge on classical computation theory. A reader who
is well aware of the theory of computation may skip Section 2.1 as well: for
such a reader the knowledge in Sections 2.2 and 2.3 (excluding the first subsection) is sufficient to follow this book. In Section 2.3, we represent Boolean
and quantum circuits (as an extension of the concept of reversible circuits)
as models of computation. Because of their descriptional simplicity, circuits
are used throughout this book to present quantum algorithms.
Chapter 3 is devoted to a complete representation of Shor's famous factorization algorithm. The instructions in Chapter 3 will help the reader to choose
which sections may, according to reader's background, be skipped. Chapter
4 is closely connected to Chapter 3, and can be seen as a representation of a
structural but not straightforward extension of Shor's algoritm.
Chapter 5 is written to introduce Grover's method for obtaining a
quadratic speedup in quantum computation (with respect to classical computation) in a very basic form, whereas in Chapter 6, we represent a method
for obtaining lower bounds for quantum computation in a restricted quantum
circuit model.
Chapters 7 and 8 are appendices intended for a beginner, but Chapter 7
is also suitable for a reader who has a strong background in computer science
and is interested in quantum computation. Chapter 8 is composed of various different topics in mathematics, since it has already turned out that, in
the area of quantum computation, many mathematical disciplines, seemingly
separate from each other, are useful. Moreover, my personal experience is
that a basic education on computer science and physics very seldom covers
all the areas in Chapter 8.
This book is concentrated mainly on quantum algorithms but other interesting topics, such as quantum information theory, quantum communication, quantum error-correcting, and quantum cryptography, are not covered.
Therefore, for additional reading, we can warmly recommend [32J by Josef
Gruska. Book [32J also contains a large number of references to works on quantum computing. A reader who is more oriented to physics may also see [70J
and [71J by C. P. Williams and S. H. Clearwater. It may also be useful to follow
the Los Alamos preprint archive at http://xxx.lanl.gov /archive/quant-ph to
learn about the new developments in quantum computing. The Los Alamos
VIII
Preface
preprint archive contains a large number of articles on quantum computing
since 1994, and includes many articles referred to in this book.
For further information to this book I am maintaining a Web page, in
particular for the collection of errata and any other comments from the readers. The link to this page is available via the Springer catalogue Web site:
http://www.springer.de/cgi-bin/search_book.pl?isbn=3-540-66783-0
Acknowledgements. My warmest thanks go to Professors Grzegorz Rozenberg and Arto Salomaa for encouraging me to write this book, and to Professor Juhani Karhumiiki and Turku Centre for Computer Science for providing
excellent working conditions during the writing period. This work has been
supported by the Academy of Finland under grants 14047 and 44087. I am
also indebted to Docent P. J. Lahti, V. Halava, and M. Ronkii for their careful
revision work on parts of this book. Thanks also go to Dr. Hans Wossner for
a fruitful cooperation.
Turku, Finland, February 2001
Mika Hirvensalo
Contents
1.
Introduction..............................................
1.1 A Brief History of Quantum Computation. . . . . . . . . . . . . . . . .
1.2 Classical Physics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Probabilistic Systems ...................................
1.4 Quantum Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1
2
4
7
2.
Devices for Computation.... .. .. ... . . .. . . .. . . .. .. . . . . . . ..
2.1 Classical Computational Models. . . . . . . . . . . . . . . . . . . . . . . . ..
2.1.1 TUring Machines .................................
2.1.2 Probabilistic TUring Machines. . . . . . . . . . . . . . . . . . . . ..
2.1.3 Multitape TUring Machines. . . . . . . . . . . . . . . . . . . . . . ..
2.2 Quantum Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.2.1 Quantum Bits ...................................
2.2.2 Quantum Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
2.2.3 More on Quantum Information ................... "
2.2.4 Quantum TUring Machines. . . . . . . . . . . . . . . . . . . . . . ..
2.3 Circuits...............................................
2.3.1 Boolean Circuits .................................
2.3.2 Reversible Circuits ...............................
2.3.3 Quantum Circuits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
13
13
13
16
20
21
22
24
27
30
34
34
36
38
3.
Fast Factorization ........................................
3.1 Quantum Fourier Transform. . . . . . . . . . . . . . . . . . . . . . . . . . . ..
3.1.1 General Framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
3.1.2 Hadamard-Walsh Transform. .. . ... .. . ... . . . . .. ....
3.1.3 Quantum Fourier Transform in Zn. . . . . . . . . . . . . . . . ..
3.1.4 Complexity Remarks. .. . . .. . . . ... . . . .. . . . . .. .. . . ..
3.2 Shor's Algorithm for Factoring Numbers. . . . . . . . . . . . . . . . . ..
3.2.1 From Periods to Factoring. . . . . . . . . . . . . . . . . . . . . . . ..
3.2.2 Orders of the Elements in Zn ......................
3.2.3 Finding the Period. . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
3.3 The Correctness Probability. . . . . . . . . . . . . . . . . . . . . . . . . . . ..
3.3.1 The Easy Case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
3.3.2 The General Case ............................ . . ..
41
41
41
43
44
49
50
50
52
54
56
56
57
X
Contents
3.3.3 The Complexity of Shor's Factorization Algorithm. . .. 61
3.4 Excercises............................................. 62
4.
Finding the Hidden Subgroup ............................
4.1 Generalized Simon's Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . ..
4.1.1 Preliminaries....................................
4.1.2 The Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
4.2 Examples..............................................
4.2.1 Finding the Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
4.2.2 Discrete Logarithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
4.2.3 Simon's Original Problem .........................
4.3 Exercises..............................................
63
64
64
65
69
69
69
70
70
5.
Grover's Search Algorithm. ... .. ... . . .. . . . . .. . . .. . . .. . . ..
5.1 Search Problems .......................................
5.1.1 Satisfiability Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . ..
5.1.2 Probabilistic Search ..............................
5.1.3 Quantum Search with One Query. . . . . . . . . . . . . . . . ..
5.2 Grover's Amplification Method. . . . . . . . . . . . . . . . . . . . . . . . . ..
5.2.1 Quantum Operators for Grover's Search Algorithm. ..
5.2.2 Amplitude Amplification . . . . . . . . . . . . . . . . . . . . . . . . ..
5.2.3 Analysis of Amplification Method ..................
5.3 Utilizing Grover's Search Method. . . . . . . . . . . . . . . . . . . . . . . ..
5.3.1 Searching with Unknown Number of Solutions .......
73
73
73
74
76
79
79
80
84
87
87
6.
Complexity Lower Bounds for Quantum Circuits .........
6.1 General Idea. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
6.2 Polynomial Representations. . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
6.2.1 Preliminaries....................................
6.2.2 Bounds for the Representation Degrees. . . . . . . . . . . . ..
6.3 Quantum Circuit Lower Bound. . . . . . . . . . . . . . . . . . . . . . . . . ..
6.3.1 General Lower Bound. . . . . . . . . . . . . . . . . . . . . . . . . . . ..
6.3.2 Some Examples ..................................
91
91
92
92
96
98
98
101
7.
Appendix A: Quantum Physics ...........................
7.1 A Brief History of Quantum Theory ......................
7.2 Mathematical Framework for Quantum Theory .............
7.2.1 Hilbert Spaces ...................................
7.2.2 Operators .......................................
7.2.3 Spectral Representation of Self-adjoint Operators .....
7.2.4 Spectral Representation of Unitary Operators ........
7.3 Quantum States as Hilbert Space Vectors ..................
7.3.1 Quantum Time Evolution .........................
7.3.2 Observables .....................................
7.3.3 The Uncertainty Principles ........................
103
103
105
107
108
109
111
115
116
118
121
Contents
8.
XI
7.4 Quantum States as Operators ............................
7.4.1 Density Matrices .................................
7.4.2 Subsystem States .................................
7.4.3 More on Time Evolution ..........................
7.4.4 Representation Theorems ..........................
7.5 Exercises ..............................................
125
126
130
135
136
144
Appendix B: Mathematical Background ..................
8.1 Group Theory .........................................
8.1.1 Preliminaries ....................................
8.1.2 Subgroups, Cosets ................................
8.1.3 Factor Groups ...................................
8.1.4 Group Z~ .......................................
8.1.5 Group Morphisms ................................
8.1.6 Direct Product ...................................
8.2 Fourier 'Iransforms .....................................
8.2.1 Characters of Abelian Groups ......................
8.2.2 Orthogonality of the Characters ....................
8.2.3 Discrete Fourier 'Iransform ........................
8.2.4 The Inverse Fourier 'Iransform .....................
8.2.5 Fourier 'Iransform and Periodicity ..................
8.3 Linear Algebra .........................................
8.3.1 Preliminaries ....................................
8.3.2 Inner Product ....................................
8.4 Number Theory ........................................
8.4.1 Euclid's Algorithm ...............................
8.4.2 Continued Fractions ..............................
8.5 Shannon Entropy and Information ........................
8.5.1 Entropy .........................................
8.5.2 I.nformation ......................................
8.6 Exercises ..............................................
145
145
145
146
147
149
151
153
154
154
156
158
160
161
161
161
164
167
167
168
175
175
179
181
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 183
Index . ........................................................ 187
1. Introduction
1.1 A Brief History of Quantum Computation
In connection to computational complexity, it could be stated that the theory
of quantum computation was launched in the beginning of the 1980s. A most
famous physicist, Nobel Prize winner Richard P. Feynman, proposed in his
article [30] which appeared in 1982 that a quantum physical system of R particles cannot be simulated by an ordinary computer without an exponential
slowdown in the efficiency of the simulation. However, a system of R particles
in classical physics can be simulated well with only a polynomial slowdown.
The main reason for this is that the description size of a particle system is
linear in R in classical physics, l but exponential in R according to quantum
physics (In Section 1.4 we will learn about the quantum physics description).
Feynman himself expressed:
But the full description of quantum mechanics for a large system with R particles is given by a function 'IjJ(XI, X2, ••• , XR, t) which we call the amplitude to
find the particles Xl, ... , XR, and therefore, because it has too many variables,
it cannot be simulated with a normal computer with a number of elements
proportional to R or proportional to N. [30)
Feynman also suggested that this slowdown could be avoided by using a computer running according to the laws of quantum physics. This idea suggests, at
least implicitly, that a quantum computer could operate exponentially faster
than a deterministic classical one. In [30], Feynman also addressed the problem of simulating a quantum physical system with a probabilistic computer
but due to interference phenomena, it appears to be a difficult problem.
Quantum mechanical computation models were also constructed by Benioff [5] in 1982, but Deutsch argued in [23J that Benioff's model can be
perfectly simulated by an ordinary computer. In 1985 in his notable paper
[23], Deutsch was the first to establish a solid ground for the theory of quantum computation by introducing a fully quantum model for computation and
giving the description of a universal quantum computer. Later, Deutsch also
defined quantum networks in [24]. The construction of a universal quantum
Turing machine was improved by Bernstein and Vazirani in [9]' where the
lOne should give the coordinates and the momentum of each particle with required precision.
2
1. Introduction
authors show how to construct a universal quantum 'lUring machine capable
of simulating any other quantum 'lUring machine with polynomial efficiency.
After the pioneering work of D. Deutsch, quantum computation still remained a marginal curiosity in the theory of computation until 1994, when
Peter W. Shor introduced his celebrated quantum algorithms for factoring
integers and extracting discrete logarithms in polynomial time [63]. The importance of Shor's algorithm for finding factors is well-known: reliability of
the famous RSA cryptosystem designed for secret communications is based on
the assumption that the factoring of large integers will remain an intractable
problem, but Shor demonstrated that this is not true if one could build a
quantum computer.
However, the theory is far more developed than the practice: a large-scale
quantum computer has not been built yet. A major difficulty arises from two
somehow contradictory requirements. On the one hand the computer memory consisting of a microscopically small quantum system must be isolated as
perfectly as possible to prevent destructive interaction with the environment.
On the other hand, the "quantum processing unit" cannot be totally isolated,
since the computation must carryon, and a "supervisor" should ensure that
the quantum system evolves in the desired way. In principle, the problem
that non-controllable errors occur is not a new one: in classical information
theory we consider a noisy channel that may corrupt the messages. The task
of the receiver is to extract the correct information from the distorted message without any additional information transmission. The classical theory of
error-correcting codes [43] addresses this problem, and the elementary result
of this theory originally, due to Claude Shannon, could be expressed as follows: for a reasonably erratic channel, there is a coding system of messages
which allows us to decrease the error probability in the transmission as much
as we want.
It was first believed that a corresponding scheme for quantum computations was impossible even in theory, mainly because of the No-cloning Theorem [72], which states that quantum information cannot be duplicated exactly. However, ShOT demonstrated in his article [64] how we can construct
error-correcting schemes for quantum computers, thus establishing the theory of quantum error-correcting codes. To learn more about this theory, the
reader may consult [16], for example. In article [55], J. Preskill emphasizes
that quantum error-correcting codes may someday lead to the construction
of a large-scale quantum computer, but this remains to be seen.
1.2 Classical Physics
Physics, as we can understand it today, is the theory of overall nature. This
theory is naturally too broad to be profoundly accessed in a brief moment,
so we are satisfied just to point out some essential features that will be
1.2 Classical Physics
3
important when learning about the differences between quantum and classical
computation.
As its very core, physics is ultimately an empirical science in the sense
that a physical theory can be regarded valid only if the theory agrees with
empirical observations. Therefore, it is not surprising that the concept of
observables 2 has great importance in the physical sciences. There are observables associated with a physical system, like position and momentum, to
mention a few. The description of a physical system is called the state of the
system.
Example 1.2.1. Assume that we would like to describe the mechanics of a
single particle X in a closed region of space. The observables used for the
system description are the position and the momentum. Thus, we can, under a fixed coordinate system, express the state of the system as a vector
x = (Xl,X2,X3,Pl,P2,P3) E ]R6, where (Xl,X2,X3) describes the position and
(Pl,P2,P3) the momentum.
As the particle moves, the state of the system changes in time. The way
in which classical mechanics describes the time evolution of the state is given
by the Hamiltonian equations of motion:
d
8
-d Xi = -8 H,
t
Pi
where H = H(Xl,X2,X3,Pl,P2,P3) is an observable called the Hamiltonian
function of the system.
Suppose now that another particle Y is inserted into our system. Then
the full description of the system is given as a vector (x, y) E ]R6 X ]R6, where
x is as before, and y describes the position and the momentum of the second
particle.
To build the mathematical description of a physical system, it is sufficient to
assume that all the observables take their values in a set whose cardinality
does not exceed the cardinality of the real number system. Therefore, we
can assume that the observables take real number values; if it is natural to
think that a certain observable takes values in some other set A, we can
always replace the elements of A with suitable real numbers. Keeping this
agreement in mind, we can list the properties of a description of a physical
system .
• A physical system is described by a state vector (also called state) x E ]Rk
for some k. The set of states is called the phase space of the system. The
state is the full description of the dynamic properties of interest. This
means that the state completely describes the properties whose development we actually want to describe. The other physical properties, such as
electrical charge, temperature, etc., are not included in the state if they do
2. In
classical mechanics, observables are usually called dynamic variables.
4
1. Introduction
not alter or do not affect the properties of interest. It should be emphasized here that, in the previous example, we were interested in describing
the mechanics of a particle, so the properties of interest are its position and
momentum. However, it may be argued that if we investigate a particle in
an electrical field, the electrical charge of the particle definitely affects its
behaviour: the greater the charge, the greater the acceleration. But in this
case, the charge is seen as a constant property of the system and is not
encoded in the state .
• The state depends on time, so instead of x we should actually write x(t) or
Xt. If the system is regular enough, as classical mechanics is, a state also
determines the future states (and also the past states). That is, we can
find a time-dependency which is a function Ut such that x(t) = Ut(x(O)).
This Ut , of course, depends on the system itself, as well as on the constant
properties of the system. In our first example, Ut is determined by the
Hamiltonian via the Newtonian equations of motion .
• If two systems are described by states x and y, the state of the compound
system consisting of both systems can be written as (x, y). That is, the
description of the compound system is a Cartesian product of the subsystems.
1.3 Probabilistic Systems
In order to be able to comment on some characteristic features of the representation of quantum systems, we first study the representation of a probabilistic
system. A system admitting probabilistic nature means that we do not know
for certain the state of the system, but we do know the probability distribution of the states. In other words, we know that system is in states Xl, ... , Xn
with probabilities P1, ... , Pn that sum up to 1 (if there is a continuum of the
possible states as in Example 1.2.1, we should have a probability distribution
that integrates up to 1, but for the sake of simplicity, we will study here only
systems having finitely many states). Notation
(1.1)
where Pi ~ 0 and P1 + ... + Pn = 1 stands for a probability distribution,
meaning that the system is in state Xi with probability Pi. We also call
distribution (1.1) a mixed state. Hereafter, states Xi are called pure states.
Now make a careful distinction between the notations: Expression (1.1)
does not mean the expectation value (also called the average) of the state
(1.2)
but (1.1) is only the probability distribution over states
Xi.
1.3 Probabilistic Systems
5
Example 1.3.1. Tossing a fair coin will give head h or tail t with a probability
of According to classical mechanics, we may think that, in principle, perfect
knowledge about the coin and all circumstances connected to the tossing will
allow us to determine the outcome with certainty. However, in practice it is
impossible to handle all these circumstances, and the notation [h] + [t]
for the mixed state of a fair coin reflects our lack of information about the
system.
!.
!
!
Using an auxiliary system, we can easily introduce a probabilistic nature
for any system. In fact, this is what we usually do in connection with probabilistic algorithms: the algorithm itself is typically thought to be strictly
deterministic, but it utilizes random bits, which are supposed to be freely
available. 3
Example 1.3.2. Let us assume that the time evolution of a system with pure
states Xl. ... , xn also depends on an auxiliary system with pure states hand
t, such that the compound system state (Xi, h) evolves during a fixed time
interval into (Xh(i),h) and (Xi,t) into (Xt(i),t), where h,t: {l, ... ,n} f-t
{I, ... ,n} are some functions. The auxiliary system with states hand t can
thus be interpreted as a control system which indicates how the original one
should behave.
Let us then consider the control system in a mixed state Pl[h] + P2[t] (if
Pl.P2 =/;
we may call the control system a biased coin). The compound
state can then be written as a mixture
!,
which evolves into a mixed state
If the auxiliary system no longer interferes with the first one, we can ignore
it and write the state of the first system as
A control system in a mixed state is called a randomizer. Supposing that the
randomizers are always available, we may even assume that the system under
consideration evolves probabilistic ally and ignore the randomizer.
Using a more general concept of randomizer, we can achieve notational
and conceptual simplification by assuming that the time evolution of a system
is not a deterministic procedure, but develops each state Xi into a distribution
(1.3)
3
In classical computation, generating random bits is a very complicated issue. For
further discussion on this topic, consult section 11.3 of [48].
1. Introduction
6
such that Pli +P2i + .. .+Pni = 1 for each i. In (1.3), Pji is the probability that
the system state Xi evolves into Xj. Notice that we have now also made the
time discrete in order to simplify the mathematical framework, and that this
actually is well-suited to the computational aspects: we can assume that we
have instantaneous descriptions of the system between short time intervals,
and that during each interval, the system undergoes the time evolution (1.3).
Of course, the time evolution does not always need to be the same, but
rather may depend on the particular interval. Considering this probabilistic
time evolution, the notation (1.1) appears to be very handy: during a time
interval, a distribution
(1.4)
evolves into
+ ... + Pnl [XnJ) + ... + Pn (PIn [XIJ + ... + Pnn [XnJ)
= (Pl1PI + ... + PlnPn) [XIJ + ... + (PnIPI + ... + PnnPn) [XnJ
= p~[XIJ + p~[X2J + ... + P~[Xn],
where we have denoted p~ = PilPI + ... + PinPn' The probabilities Pi and p~
PI (Pl1 [XIJ
are thus related by
PJ)
P2
( ··
·
p~
-
(Pl1
Pl2 ... PIn)
P21 P22
P2n (PI)
P2
.
.
..
..
..
.
' . '
Pnl Pn2 ... Pnn
.
. '
.
(1.5)
Pn
Notice that the matrix in (1.5) has non-negative entries and Pli + P2i + ... +
Pni =-1 for each i, which guarantees that pi +P2+" .+p~ = PI +P2+·· '+Pn'
A matrix with this property is called a Markov matrix. A probabilistic system
with a time evolution described above is called a Markov chain.
Notice that a distribution (1.4) can be considered as a vector with nonnegative coordinates that sum up to 1 in an n-dimensional real vector space
having [xIJ, ... , [xnJ as basis vectors. The set of all mixed states (distributions) is a convex set,4 having the pure states as extremals. Unlike in the
representation of quantum systems, the fact that the mixed states are elements of a vector space is not very important. However, it may be convenient
to describe the probabilistic time evolution as a Markov mapping, i.e., a linear
mapping which preserves the property that the coordinates are non-negative
and sum up to 1. For an introduction to Markov chains, see [58]' for instance.
4
A set S of vectors is convex, if for all Xl, X2 E S and all PI, P2 2: 0 such that
+ P2 = 1 also PlXl + P2X2 E S. An element xES of a convex set is an
extremal, if X = PlXl + P2X2 implies that either PI = 0 or P2 = 0_
PI
1.4 Quantum Mechanics
7
1.4 Quantum Mechanics
In the beginning of the twentieth century, experiments on atoms and radiation
physics gave strong support to the idea that there are physical systems that
cannot be satisfactorily described even by using the Markov chain representation of the previous section. In Section 7.1 we will discuss these experiments
in more detail, but here in the introductory chapter we are satisfied with
only presenting the mathematical description of a quantum mechanical system. We would like to emphasize that this representation based on so-called
state vectors which is studied here is not the most general representation of
quantum systems. A general Hilbert space formalism of quantum mechanics
is represented in Section 7.4. The advantage of a representation using only
state vectors is that it is mathematically simpler than the more general one.
To explain the terminology, we usually use "quantum mechanics" to mean
the mathematical structure that describes "quantum physics" , which can be
understood more generally.
The quantum mechanical description of a physical system looks very much
like the probabilistic representation
(1.6)
but still differs essentially from (1.6).
In fact, in quantum mechanics a state of an n-level system is depicted as a
unit-length vector in an n-dimensional complex vector space Hn (see Section
8.3). 5 We call Hn the state space of the system. To illustrate, let us choose an
orthonormal basis {IX1),"" IXn)} for the state space Hn (we assume here
that n is a finite number). This strange "ket"-notation Ix) is originally due
to P. Dirac, and its usefulness will become apparent later. Now, any state of
our quantum system can then be written as
(1.7)
where O!i are complex numbers called the amplitudes (with respect to the
chosen basis) and the requirement of unit length means that 10!112 + 10!212 +
... + IO!nl 2 = 1.
It should be immediately emphasized that the choice of the orthonormal
basis {IX1),"" IXn)} is arbitrary but, for any such fixed basis, refers to a
physical observable which can take n values. To simplify the framework, we do
not associate any numerical values with the observables in this section, but we
merely say that the system can have properties Xl, .•. , X n . Numerical values
associated to the observables are handled in Section 7.3.2. The amplitudes O!i
induce a probability distribution in the following way: the probability that a
system in a state (1.7) is seen to have property Xi is IO!il 2 (we also say that the
5
A finite-dimensional vector space over complex numbers is an example of a Hilbert
space.
8
1. Introduction
probability that Xi is observed is lai). The basis vectors IXi) are called the
basis states and (1.7) is referred to as a superposition of basis states. Mapping
'I/J(Xi) = ai is called the wave function with respect to basis IX1), ... , Ix n ).
Even though this state vector formalism is simpler than the general one,
it has one inconvenient feature: if Ix) ,Iy) E Hn are any states that satisfy
Ix) = eilJ Iy), we say that states Ix) and Iy) are equivalent. Clearly equivalent
states induce the same probability distribution over basis states (with respect
to any chosen basis), and we usually identify equivalent states.
Example 1.4.1. A two-level quantum mechanical system can be used to represent a bit, and such a system is called a quantum bit. For such a system we
choose an orthonormal basis {IO), II)}, and a general state of the system is
ao 10)
+ aliI) ,
where laol 2+ lal1 2= 1. The above superposition induces a probability distribution such that the probabilities that the system is seen to have properties
o and 1 are laol 2and lal1 2respectively.
What about the time evolution of quantum systems? The time evolution
of a probabilistic system via Markov matrices is replaced by matrices with
complex number entries that preserve quantity lal1 2+ ... + lan l2. Thus, the
quantum system in a state
evolves during a time interval into the state
where amplitudes a!, ... , an and ai, ... ,
a}
a2 )
( ··
·
a~
-
1n
1
(au
a21 a12
a22 ... aa2n ) (a
a2 )
. '.
. '
.
..
...
.
anI an2 ... ann
a~
.
. ,
.
are related by
(1.8)
an
and lail2 + ... + la~12 = lal1 2+ ... + lan l2. It turns out that the matrices
satisfying this requirement are unitary matrices (see Section 7.3.1 for details).
Unitarity of a matrix A means that the transpose complex conjugate of A,
denoted by A *, is the inverse matrix to A. Matrix A * is also called the
adjoint matrix 6 of A. By saying that the time evolution of quantum systems
is unitary, several authors mean that the evolution of quantum systems are
determined by unitary matrices. Notice that unitary time evolution has very
6
Among physics-oriented authors, there is also a widespread tradition of denoting
the adjoint matrix by At.
1.4 Quantum Mechanics
9
interesting consequences: unitarity especially means that the time evolution of
a quantum system is invertible. If fact, (aI, ... , an) can be perfectly recovered
from (a~, ... ,a~), since the matrix in (1.8) has an inverse."
We interrupt the description of quantum systems for a moment to discuss
the differences between probabilistic and quantum systems. As we have said
before, a mixed state (probability distribution)
(1.9)
of a probabilistic system and a superposition
(1.10)
of a quantum system formally resemble each other very closely, and therefore
it is quite natural if (1.10) can be seen as a generalization of (1.9). At first
glance, the interpretation that (1.10) induces the probability distribution such
that lail 2 is the probability of observing Xi may only seem like a technical
difference. Can (1.10) also be interpreted to represent our ignorance such that
the system in state (1.10) is actually in some state IXi} with a probability of
lail 2 ? The answer is absolutely no. A fundamental difference can be found
even by recalling that the orthonormal basis of Hn can be chosen freely. We
can, in fact, choose an orthonormal basis Ix~}, ... , Ix~} such that
so, with respect to the new basis, the state of the system is simply Ix~}.
The new basis refers to another physical observable, and with respect to this
observable, the system may have some of the properties x~, ... , x~. But in
the state IxD, the system is seen to have property x~ with a probability of
1.
Example 1.4.2. Consider a quantum system with two basis states t and h
(of course this system is the same as a quantum bit, only the terminology
is different for illustration). Here we call this system a quantum coin. We
consider a time evolution
1
1
1
1
Ih}
H
V2 Ih} + V2 It}
It}
H
V2 Ih} - V2 It} ,
which we here call a fair coin toss (verify that the time evolution is unitary).
Notice that, beginning with either state Ih} or It}, the state after the toss is
one of the above states on the right-hand side, and that both of them have the
property that hand t will both be observed with a probability of Imagine
then that we begin with state Ih} and perform the fair coin toss twice. After
the first toss, the state is as above, but if we do not observe the system, the
!.
10
1.
Introduction
state will be Ih) again after the second toss (verify). The phenomenon that
t cannot be observed after the second toss clearly cannot take place in any
probabilistic system.
Remark 1.4.1. In quantum mechanics, a state (1.10) is a pure state: it contains the maximal information of the system properties. The mixed states of
quantum systems are discussed in Section 7.4.
Let us now continue the description of quantum systems by introducing
two systems with basis states IX1), ... , IXn) and IY1), ... , IYm) respectively.
As in the case of classical and probabilistic systems, the basis states of the
compound system can be thought of as pairs (Ixi), IYj)). It is natural to
represent the general states of the compound system as
n
n
i=l
j=l
L L>~ij lXi, Yj),
(1.11)
where L:~=1 L:j=l laij 12 = 1. A natural mathematical framework for representations (1.11) is the tensor product of the state spaces of the subsystems,
and now we will briefly explain what is meant by a tensor product.
Let Hn and Hm be two vector spaces with bases {Xl' ... ' xn} and
{Y1, ... , Ym}. The tensor product of spaces Hn and Hm is denot~d by
Hn&JHm. Space Hn&JHm has ordered pairs (Xi,Yj) as a basis, thus Hn&JHm
has dimension mn. We also denote (Xi, Yj) = Xi &J Yj and say that Xi &J Yj is
the tensor product of basis vectors Xi and Yj. Tensor products of other vectors than basis vectors are defined by requiring that the product is bilinear:
( t aixi) &J
i=l
(f
j=1
f
(3jYj) = t
ai(3j Xi &J Yj·
i=lj=l
Since Xi &J Yj form the basis of Hn &J H m , the notion of the tensor product
of vectors is perfectly well established, but notice carefully that the tensor
product is not commutative. In connection with representing quantum systems, we usually omit the symbol &J and use notations closer to the original
idea of regarding Xi &J Y j as a pair (Xi, Y j):
If (1.11) can be represented as
n
m
n
m
i=l
j=1
i=1
j=l
= (tai IXi))
i=1
(f(3j IYj)),
j=l
1.4 Quantum Mechanics
11
we say that the compound system is in decomposable state. Otherwise, the
system state is entangled. It is plain to verify that the the notion "decomposable state" is independent of the bases chosen for each spaces.
Since HI @ (Hrn @ Hn) is clearly isomorphic to (HI @ Hm) @ Hn, we can
omit the parentheses and refer to this tensor product as HI@Hm @Hn- Thus,
we can inductively define the tensor products of more than two spaces.
Example 1.4.3. A system of m quantum bits is described by an m-fold tensor
product of two-dimensional Hilbert space H 2 . Thus the system has basis
states IX1) IX2) .. . Ix m), where (Xl, ... , Xm) E {O, l}m. The dimension of this
space is 2m and the system of m qubits is referred to as the quantum register
of length m.
We conclude this section by listing the key features of a finite-state quantum system in state-vector formalism.
• The basis states of an n-level quantum system form an orthonormal basis
of the system state space, Hilbert space Hn. This basis can be chosen freely
and it refers to a particular observable.
• A state of a quantum system is a unit-length vector
(1.12)
These states are called superpositions of basis states, and they induce a
probability distribution such that when one observes (1.12), a property Xi
is seen with probability IO::iI 2 .
• If two quantum systems are depicted by using state spaces Hn and Hm,
then the state space of the compound system consisting of these two subsystems is described by tensor product Hm @ Hn.
• The states of quantum systems change via unitary transformations.
2. Devices for Computation
To study computational processes we have to fix a computational device
first. In this chapter, we study Turing machines and circuits as models of
computation. We use the standard notations of formal language theory and
represent these notations now briefly. An alphabet is any set A. The elements
of an alphabet A are called letters. The concatenation of sets A and B is
a set AB consisting of strings formed of any element of A followed by any
element of B. Especially, A k is the set of strings of length k over A. These
strings are also called words. The concatenation WI W2 of words WI and W2 is
just the word WI followed by W2. The length of a word w is denoted by Iwl
or £( w) and defined as the number of the letters that constitute w. We also
define A O to be the set that contains only the empty word f. having no letters,
and A * = A 0 u A I U A 2 U ... is the set of all words over A. Mathematically
speaking, A * is the free monoid generated by the elements of A, having the
concatenation as the monoid operation and f. as the unit element.
2.1 Classical Computational Models
2.1.1 Turing Machines
The very classical model to describe computational processes is the Turing
machine, or TM for short. To give a description of a Turing machine, we fix a
finite set of basic information units called alphabet A, a finite set of internal
control states Q and a transition function
8: Q x A
~
Q x A x {-I, 0, I}.
(2.1)
Definition 2.1.1. A (deterministic) Turing machine M over alphabet A is
a sixtuple (Q, A, 8, qo, qa, qr), where qo, qa, qr E Q are the initial, accepting,
and rejecting states respectively and 8 as above.
The reason for calling the Turing machine of the above definition deterministic is due to the form of the transition 8, which describes the dynamics (the
computation) of the machine. Later we will introduce some other forms of
Turing machines, but for now, we will give an interpretation for the above
definition, mainly using the notations of [48]. As a configuration of a Turing
14
2. Devices for Computation
machine we understand a triplet (ql,X,y), where ql E Q and X,y E A*. For
a configuration c = (ql, x, y), we say that the Turing machine is in state ql
and that the tape contains word xy. If word y also begins with letter aI, we
say that the machine is scanning letter al or reading letter al. To describe
the dynamics determined by the transition function 8, we write x = wla
and y = alw2, where a,al E A and WI,W2 E A*. Thus, c can be written as
c = (ql, wla, al W2), and the transition 8 defines also a transition rule from
one configuration to another in a very natural way: if 8(ql' ad = (q2, a2, d),
then a configuration
can be transformed to
depending on if d = -1, d = 0, or d = 1 respectively. This means that, under
transition 8(ql, al) = (q2, a2, d), the symbol al being scanned is replaced with
a2 (we say that the machine prints a2 to replace al); the machine enters state
q2, and begins to scan the symbol to the left of the previously scanned symbol,
continues scanning the same location, or scans the symbol to the right of the
previously scanned symbol, depending on d E {-I, 0, I}. We also say that
the Turing machine's read-write head moves to the left, remains stationary,
or moves to the right.
If 8 defines a transition from c to c', we write c f- c' and say that c' is a
successor of c, c yields c', and that c is followed by c'. The transition from c
to c' is called a computational step. The anomalous cases c = (ql, W, E) and
c = (ql, E, w) (recall that E was defined to be the empty word having no letters
at all) are treated by introducing a blank symbol u, extending the definition
of 8 and replacing (ql, w, E) (resp. (ql, E, w)) with (ql, w, u), (resp. (ql, u, w))
if necessary.
A computation of a Turing machine with input w E A* is defined as a
sequence of configurations Co, CI, C2, ... such that Co = (qQ, E, w) and Ci I- Ci+l
for each i. We say that the computation halts if some Ci has no successors, or
if the state symbol of the configuration of Ci is either qa or qr' In the former
case, the computation is accepting, and in the latter case it is rejecting.
If a computation of a Turing machine T beginning with configuration
(qQ, E, w) leads into a halting configuration (q, WI, W2) in t computational
steps, we say that T computes WIW2 from the input word w in time t. Thus,
a Turing machine can be seen as a device computing (partial) functions A* -+
A *; but, we can also ignore the output WI W2 and just say that the Turing
machine T accepts the input w if it halts in an accepting state, and that T
rejects the input w if it either does not halt or halts in a rejecting state. Thus,
a Turing machine can also be seen as an acceptor that classifies the words
into those which are accepted and those which are rejected.
2.1 Classical Computational Models
15
Definition 2.1.2. A set S of words over A is a recursively enumerable language if there is a Turing machine T such that T accepts w if and only if
wE S.
Definition 2.1.3. A set S is a recursive language if there is a Turing machine T such that each computation ofT halts, and T accepts w if and only
ifw E S.
Families of recursively enumerable and recursive languages are denoted by
RE and R respectively. By definition, R ~ RE, but it is a well-known fact
that also R =f. RE; see [48], for instance.
It is a widespread belief that Turing machines capture the notion of algorithmic computability: whatever is algorithmically computable is also computable by a Turing machine. This strong belief should actually be called
a hypothesis, which is known as the Church- Turing thesis. In this book we
establish the notion of algorithm by using Turing machines. Turing machines
provide a clear notion for computational resources like the time and the space
used during computation. Moreover, by using Turing machines, it is easy to
introduce the notions of probabilistic, nondeterministic and quantum computation. On the other hand, it is worth arguing that, for almost all practical
purposes, the notion of the Turing machine is clumsy: even to describe very
simple algorithms by using Turing machines requires a lengthy and fairly
non-intuitive description of the transition rules. For the above reasons, we
will represent the fundamental concepts of computability by using Turing
machines, but we will also use more sophisticated notions of algorithms including Boolean circuits, quantum circuits and even pseudo-programming
languages.
The above definitions of recursive and recursively enumerable languages
represent two classes of algorithmic decision problems. When problem is a
decision problem, it means that we are given a word w as an input and we
should decide if w belongs to some particular language or not. It is clear that,
for recursive languages, when the Turing machine accepts the particular language, it offers an algorithm for solving this problem. However, for recursively
enumerable languages the corresponding Turing machine does not provide a
good algorithmic solution. The reason for this is that even though the machine halts when the input w belongs to S, we do not know how long the
computation lasts on individual input words w. Thus, we do not know, how
long we should wait until we know that the machine will not halt, i.e., the
decision w 1:. S cannot be made. We call all decision problems inside R algorithmically solvable, recursively solvable, or decidable. The decision problems
outside R are called recursively unsolvable or undecidable.
The classification into recursively solvable and unsolvable problems is
nowadays far too rough. A typical way to introduce some refinement is to consider the computational time required to solve a particular problem. Therefore, we now define some basic concepts connected to measuring the computation time. In complexity theory, we are usually interested in measuring
16
2. Devices for Computation
the computation time only up to some multiplicative constant, and for that
purpose, the following notations are useful:
Let f and g be functions N -+ N. We write f = O(g), if there are constants
c> 0 and no EN such that fen) ::; cg(n), whenever n ~ no. If fen) ~ cg(n)
whenever n ~ no, we write f = neg). If f = O(g) and f = neg), then we
write f = 8(g) and say that f and g are of the same order.
Let M be a 'lUring machine that halts on each input and T( n) be the
maximum computation time on inputs having length n. We say that T(n)
is the time complexity function of M. A 'lUring machine M is a polynomialtime Turing machine if its time complexity function satisfies T(n) = O(nk)
for some constant k. The family of decision problems that can be solved by
polynomial-time 'lUring machines is denoted by P. Using widespread jargon,
we say that the decision problems in Pare tmctable, while the other ones are
intmctable.
2.1.2 Probabilistic Turing Machines
We can modify the definition of a deterministic 'lUring machine to provide a
model for probabilistic computation. The necessary modification is to replace
the transition function with tmnsition probability distribution
8: Q x A x Q x A x {-1,0,1} -+ [0,1].
Value 8 (qI, a I , q2, a2, d) is interpreted as the pro babili ty that, when the machine in state ql reads symbol aI, it will print a2, enter the state q2, and
move the head to direction d E {-I, 0,1}.
Definition 2.1.4. A probabilistic Turing machine M over alphabet A is a
sixtuple (Q,A,8,qo,qa,qr), where qo, qa, qr E Q are the initial, accepting,
and rejecting states respectively. It is required that for all (ql, ad E Q x A
~
8(ql,al,q2,a2,d) = 1.
(Q2,a2,d)EQxAx {-I,O,I}
From this time on, to avoid fundamental difficulties in the notion of computability, we will agree that all the values of 8(QI, aI, Q2, a2, d) are mtionaz.!
The computation of a probabilistic 'lUring machine is not so straightforward a concept as the deterministic computation. We say that the configuration c = (QI,wla,alw2) yields (or is followed by) any configuration
c' = (q2, WI, aa2w2) (resp. c' = (q2,wla,a2w2) and c' = (q2,wlaa2,w2)) with
probability p = 8(QI, aI, q2, a2, 1) (resp. with probability p = 8(qI, aI, q2, a2, 0)
and p = 8(ql, aI, q2, a2, -1)). If c yields c' with probability p, we write c f-p c'.
Let Co, CI, ... , Ct be a sequence of configurations such that Ci f-Pi+l Ci+1 for
1
This agreement can, of course, be criticized, but the feasibility of a machine
working with arbitrary real number probabilities is also questionable.
2.1 Classical Computational Models
17
each i. Then we say that Ct is computed from Co in t steps with probability
PIP2 ... Pt. If PIP2 ... Pt i- 0, we also say that Co f-Pl CI f-P2 ..• f-P' Ct is a
computation of a probabilistic Turing machine.
Remark 2.1.1. A reader who thinks that the notion of probabilistic computation based on probabilistic Turing machines does not correspond very well
to the idea of an algorithm utilizing random bits is perfectly right! In many
ways a simpler model for probabilistic computations could have been given
by using nondeterministic Turing machines. The reason for presenting this
model is to make the traditional definition of a quantum Turing machine (see
e.g. [9]) clearer.
Unlike in a deterministic computation, a single configuration can now
yield several different configurations with probabilities that sum up to 1.
Thus, the total computation of a probabilistic Turing machine can be seen
as a tree having configurations as nodes. The initial configuration is the root
and each node C has configurations followed by C as descendants. Thus, a
computation of the probabilistic machine in question is just a single branch
in the total computation tree. All. computations can be expressed by the
terms of the probabilistic systems we considered in the introductory chapter:
assume that taking t steps, a probabilistic Turing machine starting from an
initial configuration computes configurations CI, ..• , Cm with probabilities
PI, ... , Pm such that PI + ... + Pm = 1. We can then say that the total
configuration of a probabilistic machine at time t is a probability distribution
(2.2)
over the basis configurations cI, ... , em. As in the introductory chapter, we
can define a vector space, which we now call the configuration space, that has
all of the potential basis configurations as the basis vectors, and a general
configuration (2.2) is a vector in the configuration space having non-negative
coordinates that sum up to 1. On the other hand, there is now an essentially
more complicated feature compared to the introductory chapter: there is a
countable infinity of basis vectors, so our configuration space is countably
infinite-dimensional.
Regarding this latter representation of total configurations of a probabilistic Turing machine, the reader may already guess how to introduce the
notion of quantum Turing machines, but we will still continue with probabilistic computations and study some acceptance models.
When talking about time-bounded computation in connection with probabilistic Turing machines, we usually assume that all the computations have
the same length. In other words, we assume that all the computations are
synchronized so well that they reach a halting configuration at the same time,
so that all the branches of the computation tree have the same length. We
can thus regard a probabilistic Turing machine as a facility for computing
the probability distribution of outcomes. For instance, if the purpose of a
18
2. Devices for Computation
particular probabilistic machine is to solve a decision problem, then some of
the computations may end up in an accepting state, but some may also end
up in a rejecting state. What then do we mean by saying that a probabilistic
machine accepts a string or a language? In fact, there are several different
choices, some of which are given in the following definitions:
• Class NP is the family of languages S that can be accepted in polynomial
time by some probabilistic Turing machine M in the following sense: a word
w is in the language S if and only if M accepts w with nonzero probability
(see Remark 2.1.2).
• Class RP is the family of languages S that can be accepted in polynomial
time with some probabilistic Turing machine M in the following way: if
w E S, then M accepts w with a probability of at least ~, but if w ~ S,
then M always rejects w.
• Class coRP is the class of languages consisting exactly of the complements
of those in RP. Notice that coRP is not the complement of RP among
all the languages.
• We define ZPP = RP n coRP .
• Class BPP is the family of languages S that are accepted by a probabilistic
Turing machine M such that if wE S, then M accepts with a probability
of at least ~, and if w ~ S, then M rejects with a probability of at least ~.
Remark 2.1. 2. The definition of the class NP (standing for nondeterministic
polynomial time) given here is not the usual one. A much more traditional
definition would be given by using the notion of a nondeterministic Turing
machine, which can be obtained from the probabilistic ones by ignoring the
probabilities. In other words, a nondeterministic Turing machine has a transition relation 8 ~ Q x A x Q x A x {-1,0, I}, which tells whether it is possible
for a configuration c to yield another configuration c/. More precisely, the fact
that (ql, al, q2, a2, d) E 8 means that, if the machine is in state ql scanning
symbol al, then it is possible to replace al with a2, move the head to direction d, and enter state q2. However, this model resembles the probabilistic
computation very closely: indeed, the notion of "computing c' from c in t
steps with nonzero probability" is replaced with the notion that "there is a
possibility to compute c' form c in t steps" , which does not seem to make any
difference. The acceptance model for nondeterministic Turing machines also
looks like the one that we defined for probabilistic NP-machines. A word w is
accepted if and only if it is possible to reach an accepting final configuration
in polynomial time.
It can also be argued that the class NP does not correspond very well
to our intuition of practical computation. For example, if each configuration
yields two distinct ones each with a probability of ~ and the computation
lasts t steps, there are 2t final configurations, each computed from the initial
one with a probability of ~. However, we say that the machine accepts a
word if and only if at least one of these final configurations is accepting, but
2.1 Classical Computational Models
19
it may happen that only one final configuration is accepting, and we cannot distinguish the acceptance probabilities ~ (accepting) and 0 (rejecting)
practically without running the machine il(2t) times. But this would usually
make the computation last exponential time since, if the machine reads the
whole input, then t ~ n, where n is the length of the input. By its very
definition, each deterministic Turing machine is also a probabilistic one (always working with a probability of 1), and therefore P ~ NP. However,
it is a long-standing open problem in theoretical computer science whether
P=lNP.
Remark 2.1.3. Class RP (randomized polynomial time) corresponds more
closely to the notion of realizable effective computation. If w ct S, then this
fact is revealed by any computation. On the other hand, if w E S we learn
this with a probability of at least ~. This means that an RP-Turing machine has no false positives, but it may give a false negative, however, with a
probability of less than ~. By repeating the whole computation k times, the
probability of making false decision w ct S can thus be reduced to at most
~. A probabilistic Turing machine with RP-style acceptance is also called
Monte Carlo Turing machine.
Remark 2.1.4. If a language S belongs to ZPP, then there are Monte Carlo
Turing machines Ml and M2 accepting S and the complement of S respectively. By combining these two machines, we obtain an algorithm that can
be repeatedly used to make the correct decision with certainty. For if Ml
gives answer w E S one time, it surely means that w E S since Ml has no
false positives. If M2 gives an answer wE A* \ S, then we know that w ct S
since M2 has no false positives either. In both cases, the probability of a false
answer is at most ~, so by repeating the procedure of running both machines
k times, we obtain with a probability of at least 1 - ~ the certainty that
either wE S or w ct S. Thus, we can say that a ZPP-algorithm works like a
deterministic algorithm whose expected running time is polynomial. Notation
ZPP stands for Zero error Probability in Polynomial time. A ZPP-algorithm
is called a Las Vegas-algorithm.
Remark 2.1.5. The definition of BPP merely contains the idea that we accept languages or, to put it in other words, that we solve decision problems
using a probabilistic Turing machine that is required to give a correct answer
with a probability that is larger than ~. Notation BPP stands for Bounded
error Probability in Polynomial time. The constant ~ is arbitrary and any
c E (~, 1) would define the same class. This is because we can efficiently
increase the success probability by defining a Turing machine that runs the
same computation several times, and then taking the majority of results as
the answer. It is also widely believed that the class BPP is the class of
problems that are efficiently solvable. This belief is known as the Extended
Church- Turing thesis.
20
2. Devices for Computation
Earlier in this section it was mentioned that, according to the Church'lUring thesis, 'lUring machines are capable of capturing the notion of computability. However, it seems that probabilistic 'lUring machines with any
of the above acceptance models also fit very well into the notion of "algorithmic computability". Is it true that any language which is accepted by a
probabilistic 'lUring machine, regardless of the acceptance mode, can also be
accepted by an ordinary 'lUring machine? The answer is yes, and it can be
justified as follows: since the probabilities were required to be rational, we can
always simulate the computation tree of a probabilistic 'lUring machine with
an ordinary one that sequentially performs all possible computations and the
probabilities associated to them. In other words, knowledge of the probabilities on how a probabilistic algorithm works also allows us to compute the
probability distribution deterministically. 2
Therefore, the notion of probabilistic computation does not shatter the
border between decidability and undecidability. However, any deterministic
'lUring machine can also be seen as a probabilistic machine having only one
choice for each transition. Therefore, it is clear that
P
~
RP,
P
~
ZPP, and P
~
(2.3)
BPP.
The simulation of a probabilistic machine by a deterministic one is done by
sequentially computing all the possible computations of a probabilistic machine. Since the number of different computations may be exponential (recall
Remark 2.1.2), the si'mulation may also take exponential time. Hence, we can
ask whether probabilistic computation is more efficient than deterministic
computation, i.e., if some of the inclusions (2.3) are strict. Or is it true that
there is a cunning way to imitate deterministically a probabilistic computation such that the simulation time would not grow exponentially? The answer
to such questions are unknown so far.
2.1.3 Multitape Turing Machines
A straightforward generalization of a 'lUring machine is a multitape Turing
machine. This model of computation becomes apparent when speaking about
space-bounded computation.
Definition 2.1.5. A (deterministic) Turing machine with k tapes over an
alphabet A is a sixtuple (Q, A, 15, qo, qa, qr), where qo, qa, qr E Q are the
initial, accepting, and rejecting states respectively, and 15 : Q x Ak ~ Q x
(A x {-I, 0, I} ) k is the transition function.
2
The restriction to rational probabilities plays an essential role here. This restriction can be criticized, but from the practical point of view, rational probabilities
should be sufficient. Some authors allow all "computable numbers" as probabilities, but this would introduce more problems, such as: Which numbers are
computable? Which is the computational model to "compute these numbers"?
Can the comparisons of "computable numbers" (needed to decide, e.g., to decide
whether the acceptance probability is at least
be made efficiently?
i)
2.2 Quantum Information
21
A configuration of a k-tape Turing machine is a 2k + I-tuple
(2.4)
where q E Q and Xi,Yi E A*. For a configuration (2.4) we say that q is the
current state, and that XiYi is the content of the ith tape. Let Xi = Viai and
Yi = biwi· Then we say that bi is the currently scanned symbol on the ith
tape. If
where r E Q, Ci E A, and di E {-1,0, I}, then the configuration C yields
, "2'Y2, ... ,x"k 'Yk'
) were
h
(Xi,Yi
' ') = (Vi,aiCiWi,
) 1·f di -- - 1,
C, = (r,x ,I 'YI,x
(X~, yD = (Viai, CiWi), if di = 0, and (x~, y~) = (ViaiCi, Wi)' if d i = 1.
The concepts associated with computing, such as acceptation, rejection,
computational time, etc., are defined for multitape Turing machines exactly
in the same fashion as for ordinary Turing machines. But when talking about
the space consumed by the computation, it may seem fair to ignore the space
occupied by the input and the space needed for writing the output, and
merely talk about the space consumed by the computation. By using multitape machines, this is achieved as follows: we choose k > 2 and identify the
first tape as an input tape and the last tape as an output tape. The machine
must never change the contents of the input tape, and the head position in
the output tape can never move to the left. If, using such a model for computation, one reaches a final configuration C = (q, Ul, VI, U2, V2, ... , Uk, Vk), then
the space required during the computation is defined to be Z=:~2I IUiVil.
Again, we may ask whether the computational capacity of multitape Turing machines is greater than that of ordinary ones. This time, we can even
guarantee that the computation capacity of multitape machines is very close
to the capacity of a single-tape machine: it is left as an exercise to show that
whatever is computable by a multitape machine in time t is also computable
by a single-tape Turing machine in time O(t 2 ).
2.2 Quantum Information
It must be first noted that, despite the title, quantum information theory
is not the topic of this section. Instead, we will merely concentrate on descriptional differences between presenting information by using classical and
quantum systems. In fact, in this section we will present the fundamentals of
quantum information processing. A reader well aware of basic linear algebra
will presumably have no difficulties in following this section, but for a reader
feeling some uncertainty, we recommend consulting Sections 7.1, 7.2, and the
initial part of Section 7.3. Moreover, the basic notions of linear algebra are
outlined in Section 8.3.
22
2. Devices for Computation
2.2.1 Quantum Bits
Definition 2.2.1. A quantum bit, qubit for short, is a two-level quantum
system. Because there should not be any danger of confusion, we also say
that the two-dimensional Hilbert space H2 is a quantum bit. Space H2 is
equipped with a fixed basis B = {IO), II)}, a so-called computational basis.
States 10) and 11) are called basis states.
A general state of a single quantum bit is a vector
Co
10)
+ Cl 11) ,
(2.5)
having unit length, i.e, lcol 2+ ICll2 = l.
We say that an observation of a quantum bit in state (2.5) will give 0 or
1 as an outcome with probabilities of Ico 12 and ICll2 respectively.
Definition 2.2.2. An operation on a qubit, called a unary quantum gate, is
a unitary mapping U : H2 -+ H 2.
In other words, a unary quantum gate defines a linear operation
10)
11)
+ b 11)
f-t
a 10)
f-t
clO) +dll)
such that the matrix
is unitary, i.e.,
C*)
( acdb) (a*
b* d*
=
(1 0)
01·
Notation a* stands for the complex conjugate of complex number a. That
notation A * has already been used for the free monoid generated by the set
A should cause no misunderstandings, but the meaning of a particular *symbol should be clear by the context. In what follows, notation (a, bf is
used to indicate the transposition, i.e.,
Example 2.2.1. Let us use the coordinate representation 10)
11) = (0, I)T. Then the unitary matrix
(I,O)T and
M~ = (~~)
defines an action M~ 10) = 11), M~ 11)
by M~ is called a quantum not-gate.
= 10). A unary quantum gate defined
2.2 Quantum Information
23
Example 2.2.2. Let us examine a quantum gate defined by a unitary matrix
JM~=
(
ill
I-i)
l~i ~
.
The action of ..;x;L. is
JM~ 10) = 1 ~ i 10) + 1; iiI),
(2.6)
JM~ 11) = 1; i 10) + 1~ iiI).
(2.7)
Since
observations of (2.6) and (2.7) will give 0 or 1 as the outcome, both with a
probability of~. Because ~~ = M~, gate ~ is called the square
root of the not-gate.
Example 2.2.3. Let us study a quantum gate W 2 defined by matrix
The action of W2 is
1
1
1
1
W2 10) =
J2 IO) + J2 11),
W2 10) =
J2 IO) - J21 1).
Matrix W 2 is called a Walsh matrix, Hadamard matrix or Hadamard- Walsh
matrix and will eventually appear to be very useful. A very important feature
of quantum gates that has been already mentioned implicitly is that they are
linear, and therefore it suffices to know the action on the basis states. For
example, if a Hadamard-Walsh gate acts on a state ~(IO)+ll)), the outcome
is
24
2. Devices for Computation
2.2.2 Quantum Registers
A system of two quantum bits is a four-dimensional Hilbert space H4
H2 0 H2 having orthonormal basis {IO) 10),10) 11),11) 10),11) II)}. We also
write 10) 10) = 100), 10) 11) = 101), etc. A state of a two-qubit system is a
unit-length vector
Co 100)
+ Cl 101 ) + c2110) + c3111) ,
(2.8)
so it is required that 1co1 2 + ICll2 + hl 2 + IC312 = 1.
Observation of a two-qubit system in state (2.8) will give 00, 01, 10, and 11
as an outcome with probabilities of Icol2, IClI2, h1 2, and IC312 respectively. On
the other hand, if we choose to observe only one of the qubits, the standard
rules of the probabilities will apply. This means that an observation of the
first (resp. second) qubit will give 0 or 1 with probabilities of Icol2 + ICll2 and
hl 2 + hl 2 (resp. Icol2 + hl 2 and ICll2 + IC312).
Remark 2.2.1. Notice that here the tensor product of vectors does not commute: 10) 11) f:. 11) 10). We use linear ordering (write from left to right) to
address the qubits individually.
Recall that the state z E H4 of a two-qubit system is decomposable if z
can be written as a product of states in H 2 , z = X 0 y. A state that is not
decomposable is entangled.
Example 2.2.4. State !(IOO)
+ 101) + 110) + 111)) is decomposable,
since
~(IO) 10) + 10) 11) + 11) 10) + 11) 11)) = ~(l0) + 11)) ~(IO) + 11)),
2
v2
as easily verified. On the other hand, state ~(IOO)
see this, assume on the contrary, that
1
y'2(100)
+ 111)) =
(ao 10)
v2
+ 111))
is entangled. To
+ alI1))(bo 10) + bl I1))
= aobo 100) + aobl 101) + albo 110) + alb l 111)
for some complex numbers ao, al, bo , and b1 . But then
aobo = ~
{ aobl = 0
albo = 0
albl = ~,
which is absurd.
A binary quantum gate is a unitary mapping H4 -t H 4 . To give an example
of such a gate, we again use coordinate representation 100) = (l,O,O,O)T,
101) = (0, 1,0,of, 110) = (0,0, 1,of, and 111) = (0,0,0, If·
2.2 Quantum Information
25
Remark 2.2.2. If two qubits are in an entangled state ~(IOO) + 111)), then,
observing one of them, will give 0 or 1, both with a probability of
but it is
not possible to observe different values on the qubits. It is interesting to notice
that the experiments have shown that this correlation can remain even if the
qubits are spatially separated more than 10 km [671. This distant correlation
opens opportunities for quantum cryptography and quantum communication
protocols. A pair of qubits in state ~(IOO) + 111)) is called an EPR-pair. 3
Notice that if both qubits of the EPR-pair are run through a Hadamard
gate, the resulting state is again ~(IOO) + 111)), so it is impossible to give
an ignorance interpretation for the EPR-pair.
!,
Example 2.2.5. Matrix
1000)
0100
(
Mcnot = 0001
0010
defines a unitary mapping, whose action on the basis states is Mcnot 100) =
100), Mcnot 101) = 101), Mcnot 110) = 111), Mcnot 111) = 110). Gate Mcnot is
called controlled not, since the second qubit (target qubit) is flipped if and
only if the first (control qubit) is 1.
We will give another example of a binary quantum gate after the following
important definition.
Definition 2.2.3. The tensor product, also called the Kronecker product, of
r x sand t x u matrices
au
a21
(
A=
..
.
arl
al2 ... alS)
a22 ... a2s
'.
.
.. . '..
and B = (
a r 2 ... ars
bu bl2 ... blU)
b21 b22 ... b2u
·· . . ..
· . '.
'.
btl b t2 ... btu
is an rt x su matrix defined as
A®B= (
aUB al2B ... alsB)
a21 B a22 B ... a2s B
...
'
. ..
.
..
'.
.
arlB ar2B ... arsB
If MI and M2 are 2 x 2 matrices that describe unary quantum gates, then
it is easy to verify that the joint actions of MI on the first qubit and M2 on
3
EPR stands for Einstein, Podolsky and Rosen, who first regarded the distant
correlation as a source of a paradox of quantum physics [69J.
26
2. Devices for Computation
the second one are described by the matrix M 1 0M2 . This also generalizes to
quantum systems of any size: if matrices Ml and M2 define unitary mappings
on Hilbert spaces Hn and Hm, then the nm x nm matrix Ml 0 M2 defines a
unitary mapping on space Hn 0 Hm. The action of Ml 0 M2 is exactly the
same as the action of Ml on Hn , followed by the action of M2 on Hm (or
vice versa). In particular, mapping Ml 0 M2 cannot introduce entanglement
between systems Hn and Hm·
Example 2.2.6. Let Ml = M2 = W 2 be the Hadamard matrix of the previous
section. Action on both qubits with a Hadamard-Walsh matrix can be seen
as a binary quantum gate, whose matrix is defined by
W4 = W 2 0 W 2
1
="2
11 -11 1-1
1 1)
( 1 1 -1 -1 '
1 -1 -1 1
and the action of W4 can be written as
W4lxoxl) =
~(IO) + (-Ito 11))( 10) + (_l)Xl 11))
= ~(IOO) + (-It' 101) + (-Ito 110) + (_l)xo+xl 111))
2
(2.9)
for any Xo, Xl E {O, I}.
Remark 2.2.3. Note that the state (2.9) is decomposable. This is, of course,
natural since the initial state IXoxl) is decomposable, and also W 4 decomposes
into a product of unary gates, W 4 = W 2 0 W 2 . On the other hand, Mcnot
cannot be expressed as a tensor product of 2 x 2 matrices. This could, of
course, be seen by assuming the contrary as in Example 2.2.4, but we can
use another argumentation. Consider a two-qubit system in an initial state
100). The action of the Hadamard-Walsh gate on the first qubit transforms
this into a state
~(IO) + 11)) 10) = ~(IOO) + 110)).
(2.10)
But the action of Mcnot turns a decomposable state (2.10) into an entangled
state ~(IOO) + 111)). Because M cnot introduces entanglement, it cannot be
a tensor product of two unary quantum gates.
By a quantum register of the length m we understand an ordered system
of m qubits. The state space of such a system is the m-fold tensor product
H2 m. = H2 0 ... 0H2 , and the basis states are {Ix) I x E {O, l}m}. Identifying
x = Xl ... Xm with a binary representation of a number, we can also say that
the basis states of an m-qubit register are {Ia) I a E {O, 1, ... ,2 m - I}}.
2.2 Quantum Information
27
A peculiar feature of the exponential packing density associated with
quantum systems can be plainly seen here: a general state of a system of m
quantum bits is
where Icol2 +IClI2 + ... +ic2=_112 = 1. That is to say that a general description
of an m two-state quantum system requires 2m complex numbers. Hence, the
description size of the system is exponential in its physical size.
The time-evolution of an m-qubit system is determined by unitary mappings in H 2=. The size of the matrix of such a mapping is 2m X 2m , also
exponential in the physical size of the system. 4 A more detailed explanation
on quantum register processing is provided in Section 2.3.3.
2.2.3 More on Quantum Information
So far, we have been talking about qubits, but everything generalizes also
to n-ary quantum digits; If we have an alphabet A = {al,"" an}, we can
identify the letters with basis states lal), ... , Ian) of an n-Ievel quantum
system. We say that such a basis is a quantum representation of alphabet A.
The key features of quantum representations are already listed at the end of
the introductory chapter, and are:
• A set of n elements can be identified with the vectors of an orthonormal
basis of an n-dimensional complex vector space Hn. We call Hn a state
space. When we have a fixed basis lal), ... , Ian), we call these vectors basis
states. Also, the basis that we choose to fix is usually called a computational
basis.
• A general state of a quantum system is a unit-length vector in the state
space. If al lal) + ... + an Ian) is a state, then the system is seen in state
ai with a probability of lai.
• The state space of a compound system consisting of two subsystems is the
tensor product of the subsystem state spaces.
• In this formalism, the state transformations are length-preserving linear
mappings. It can be shown that these mappings are exactly the unitary
mappings in the state space.
In the above list we have operations that can be done with quantum
systems (under this chosen formalism). We now present a somewhat surprising result called the "No-cloning Theorem" due to W. K. Wootters and
W. H. Zurek [72]. Consider a quantum system having n basis states lal),
... , Ian). Let us denote the state space by Hn and specify the state lal) to be
4
It should not be any more surprising that Feynman found the effective deterministic simulation of a quantum system to be difficult. Due to the interference
effects, it also seems to be difficult to simulate efficiently a quantum system with
a probabilistic computer
28
2. Devices for Computation
the "blanck sheet state". A unitary mapping in Hn ® Hn is called a quantum
copymachine, if for any state Ix) E H n ,
U(lx) lal)) = Ix) Ix).
Theorem 2.2.1 (No-cloning Theorem). For n > 1 there is no quantum
copymachine.
Proof. Assume that a quantum copymachine exists, even if n > 1. Because n > 1, there are two orthogonal states lal) and la2). We should have
U(lal) lal)) = lal) lal) and U(la2) lal)) = la2) la2), and also
1
U(j2(lal)
=
1
"2(lal) lal)
+ la2)) lal))
=
1
(j2(lal)
1
+ la2)))(j2(lal) + la2)))
+ lal) la2) + la2) lal) + la2) la2)).
But since U is linear,
1 1 1
U(j2(lal) + la2)) lal)) = j2U(lal) lal)) + j2U(la2) lal))
1
1
= j2lal) lal) + j2la2) la2).
The above two representations for U(~(lal) + la2)) lal)) do not coincide by
the very definition of the tensor product, a contradiction.
0
The No-cloning Theorem thus states that there is no allowed operation (unitary mapping) that would produce a copy of an arbitrary quantum state. Notice also that in the above proof, we did not make any use of unitarity; only the
linearity of the time-evolution mapping was needed. If, however, we are satisfied with cloning only the basis states, there is a solution: let I = {I, ... ,n} be
the set of indices. Partially defined mapping f : I x I -+ I x I, f(i, 1) = (i, i)
is clearly injective, so we can complete the definition (in many ways) such
that f becomes a permutation of I x I, and still satisfies f(i, 1) = (i, i). Let
f(i,j) = (i',]'). Then the linear mapping defined by U lai) laj) = lai') lap)
is a permutation of basis vectors of Hn ® Hn , and any such permutation
is unitary, as is easily verified. Moreover, U(lai) lal)) = lai) lai), so U is a
copymachine operation on basis vectors.
Remark 2.2.4. Notice that, when defining the unitary mapping U(lai) lal)) =
lai) lai), we do not need to assume that the second system looks like the first
one; the only thing we need is that the second system must have at least as
many basis states as the first one. We could, therefore, also define a unitary
mapping U lai) Ib l ) = lai) Ib i ), where Ib l ), ... , Ibm) (m 2: n) are the basis
states of some other system. What is interesting here is that we could regard
the second system as a measurement apparatus designed to observe the state
of the first system. Thus, Ib l ) could be interpreted as the "initial pointer
2.2 Quantum Information
29
state", and U as a "measurement interaction". Measurements that can be
described in this fashion, are called von Neumann-Luders measurements. The
measurement interaction is also discussed in Section 7.4.2.
We conclude this section with a discussion concerning the observation
procedure. So far, we have silently assumed that quantum systems are used
for probabilistic information processing according to the following scheme:
• The system is first prepared in an initial basis state.
• Next, the unitary information processing is carried out.
• Finally, we observe the system to see the outcome.
What is missing in the above procedure is that we have not considered the
possibility of making an intermediate observation during the unitary processing. In other words, we have not discussed the effect of observation upon
the quantum system state. For a more systematical treatment of observation,
the reader is advised to study Section 7.3.2. For now, we will present, in a
simplified way, the most widely used method to handle state changes during
an observation procedure. Suppose that a system in state
is observed with Xk as outcome. According to the projection postulate,5 the
state of the system after observation is IXk). Suppose, then, that we have a
compound system in state
n
m
LLaij IXi) IYj) ,
(2.11)
i=l j=l
and the first system is observed with outcome Xk (notice that the probability
of observing Xk is P(k) = 2:;:1 lakjI2). The projection postulate now implies
that the post-observation state of the whole system is
(2.12)
In other words, the initial state (2.11) of the system is projected to the
subspace that corresponds to the observed state, and renormalized to the
unit length. It is now worth heavily emphasizing that the state evolution from
(2.11) to (2.12) given by the projection postulate is not consistent with the
unitary time evolution, since unitary evolution is always reversible, but
there is no way to recover state (2.11) from (2.12). In fact, no explanation for
the observation process which is consistent with quantum mechanics (using a
certain interpretation) has ever been discovered. The difficulty arising when
5
The projection postulate can be seen as an ad hoc explanation for the state
transform during the measurement process.
30
2. Devices for Computation
trying to find such an explanation is usually referred to as the measurement
paradox of quantum physics.
However, instead of having intermediate observations that cause a "collapse of a state vector" from (2.11) to (2.12), we can have an auxiliary
system with unitary measurement interaction (basis state copymachine)
IXi) IX1) H IXi) IXi). That is, we replace the collapse (2.11) H (2.12) by
a measurement interaction, which turns state
m
n
n
m
i=l
j=l
(L L aij IXi) IYj) ) IXl) = L L aij IXi) IYj) IXl)
i=l j=l
(2.13)
into
n
m
i=l
j=l
L L aij IXi) IYj) IXi).
(2.14)
The transformation from (2.13) to (2.14), despite appearing different from
(2.11) H (2.12) at first glance, has many similar features. In fact, using
notation P( k) = 2:7=1 lakj 12 again, we can rewrite (2.14) as
(2.15)
Let us now interpret (2.14): the third system which we introduced shatters the
whole system into n orthogonal subspaces, which are, for any k E {I, ... ,n}
spanned by vectors IXi) IYj) IXk), i E {I, ... , n}, j E {I, ... , m}. The left
multipliers of vectors y'P(k) IXk) are unit-length vectors exactly the same as
in (2.12). Moreover, the probability of observing Xk in the right-most system
of (2.14) is P(k). Therefore, it should be clear that, if quantum information
processing continues independent of the third system, then the final probability distribution is the same if the operation (2.11) H (2.12) is replaced with
operation (2.13) H (2.14). Because of this reason, we will not consider intermediate observations in this book, but may refer to them only as notational
simplifications of procedure (2.13) H (2.14).
2.2.4 Quantum Turing Machines
To provide a model for quantum computation, one can consider a straightforward generalization of a probabilistic Turing machine, replacing the probabilities with transition amplitudes. Thus, there should be a transition amplitude
function
c5 :
Q x A x Q x A x {-I, 0,1} -+ C
such that c5(q1' aI, q2, a2, d) gives the amplitude that whenever the machine is
in state q1 scanning symbol aI, it will replace al with a2, enter state Q2, and
move the head to direction d E {-I, 0, I}.
2.2 Quantum Information
31
Definition 2.2.4. A quantum Turing machine (QTM) over alphabet A is a
sixtuple (Q,A,8,qO,qa,qr), where qo, qa, qr E Q are the initial, accepting,
and rejecting states. The transition amplitude function must satisfy
L
18(ql, aI, q2, a2, d)12 = 1
(q2 ,a2 ,d)EQxAx {-I,O,I}
for any (ql, al) E Q x A. Again, to avoid difficulties with computability issues,
we require that, for any 8 (ql, aI, q2, a2, d) = x + yi, x and y are rational.
Remark 2.2.5. In some cases it would be handy to also accept amplitudes
which are not rational. We will later consider this.
Notions such as "configuration yields another" are straightforward generalizations of those associated with probabilistic computation. We can also generalize the acceptance models NP, RP, ZPP, and BPP to correspond to
quantum computation. For any family F of languages accepted by some classical model of computation (deterministic or probabilistic) we usually write
QF to stand for the corresponding class in quantum computation. However,
there is already quite a well-established tradition to denote the quantum
counterpart of BPP 6 not by QBPP, but by BQP, and the quantum counterpart of P 7, is usually denoted by EQP to stand for Exact acceptation by
a Quantum computer in Polynomial time. Also, the quantum counterpart of
NP is usually denoted by NQP.
As in probabilistic computation, a general configuration of a quantum
Turing machine can be seen as a combination
(2.16)
of basis configurations. Taking the basis configurations as an orthonormal
basis of an infinite-dimensional vector space, we see that the general configurations, superpositions, are merely the unit-length vectors of the configuration
space. Moreover, the transition amplitude function determines a linear mapping Mii in the state space. From the theory of quantum mechanics (to be
precise: from the formalism that we are using) there now arises a further
requirement on the transition amplitude function 8: the linear mapping Mii
in the configuration space determined by 8 should be unitary.
It turns out that the unitarity of the mapping Mii can be determined
by the local conditions on the transition amplitude function (see [9]' [34],
and [47]), but the conditions are quite technical and we will not represent
them here. Instead, we will study quantum circuits as a model for quantum
6
7
The quantum counterpart of BPP is the family of languages accepted by a
polynomial-time quantum Turing machine with a correctness probability of at
least ~.
The family of languages accepted by a polynomial-time quantum Turing machine
with a probability of 1.
32
2. Devices for Computation
computation in the next chapter. The reason for this choice is that, compared
to QTMs, a quantum circuit model is much more straightforward to use for
describing quantum algorithms. In fact, all of the quantum algorithms that
are studied in chapters 3-5, are given by using quantum circuit formalism.
We end this section by examining a couple of properties of quantum Turing machines. The first thing to consider is that the transition of a QTM
determines a unitary and, therefore, a reversible time evolution in the configuration space. However, ordinary Turing machines can be irreversible, too. 8
The question is how powerful the reversible computation is. Can we design
a reversible Turing machine which performs each particular computational
task? A positive answer to this question was first given by Lecerf in [39]' who
intended to demonstrate that the Post Correspondence Problem 9 remains
undecidable even for injective morphisms. Lecerf's constructions were later
extended by Ruohonen in [61]' but Bennett in [6] was the first to give a model
for reversible computation simulating the original, possibly irreversible, computation with constant slowdown but possibly with a huge increase in the
space consumed (see also [35]).
Bennett's work was at least partially motivated by a thermodynamical
problem: according to Landauer [38], an irreversible overwriting of a bit
causes at least kT In 2 joules of energy dissipation. lO This theoretical lower
bound can always be ignored by using reversible computation, which is possible according to the results of Lecerf and Bennett.
Bennett's construction of a reversible Turing machine uses a three-tape
Turing machine with input tape, history tape and output tape. Reversibility is
obtained by simulating the original machine on the input tape, thereby writing down the history of the computation, i.e., the transition rules that have
been used so far, onto the history tape. When the machine stops, the output
is copied from the input tape to the empty output tape, and the computation
is run backwards (also a reversible procedure) to erase the history tape for future use. The amount of space this construction consumes is proportional the
computation time of the original computation, but by applying the erasure
of the history tape recursively, the space requirement can be reduced even
to O(s(n)logt(n)) at the same time using time O(t(n)logt(n)), where sand
t are the original space and time consumption [7]. For space/time trade-offs
for the reversible computing, see also [40].
8
9
10
We call an ordinary Turing machine reversible if each configuration admits a
unique precessor.
Post Correspondence Problem, or PCP for short, was among the first computational problems that were shown to be undecidable. The undecidability of the
PCP was established by Emil Post in [54]. The importance of the PCP lies in
the simple combinatorial formulation of the problem; the PCP is very useful
for establishing other undecidability results and studying the boundary between
decidability and undecidability.
Here k = 1.380658 . 10- 23 J / K is the Bolzmann's constant and T is the absolute
temperature.
2.2 Quantum Information
33
Thus, it is established that whatever is computable by a Turing machine
is also computable by a reversible Turing machine, which can be seen as a
quantum Turing machine as well. Therefore, QTMs are at least as powerful as
ordinary Turing machines. The next question which arises is whether QTMs
can do something that the ordinary Turing machines cannot. The answer to
this question is no, the reason being the same as for probabilistic Turing machines: we can use an ordinary Turing machine to simulate the computation
of a QTM by remembering all the coexisting configurations in a superposition and their amplitudes. Notice that here we use again the assumption
that all the transition amplitudes can be written as x+yi, where x and yare
rational. This assumption can, of course, be critisized, but the feasibility of a
model having arbitrary complex number amplitudes is also questionable. It
is however true that, on some operations on quantum systems, non-rational
amplitudes, such as ~, would be very natural. Therefore, in practice, we
will also use amplitudes x + yi where x and y are not rational, but only if
there is an "efficient" way to find rational approximations for x and y. It is
left to the reader to verify that in each example in this book, the amplitudes
not expressible by rational numbers can be well approximated by amplitudes
which can be expressed by using rational numbers.
We have good reasons to believe (as did Feynman) that QTMs cannot be
efficiently simulated by ordinary ones, not even by probabilistic ones. One
major reason to believe this lies Shor's factoring algorithm [63]: there is a
polynomial-time, probabilistic quantum algorithm for factoring integers, but
any classical polynomial-time algorithm for that purpose has not been found,
not even a probabilistic one, despite the huge effort.
The next and final issue connected to QTMs in this section is the existence of a universal quantum Turing machine. We say that a Turing machine
is universal if it can simulate the computation of any other Turing machine
in the following sense: the description (the transition rules) and the input of
an arbitrary Turing machine M are given to the universal machine U suitably encoded, and the universal machine U performs the computation of M
on the encoded string. We could also say that the universal Turing machine
U is programmable so as to perform any Turing Machine computation. It
can be shown that a universal Turing machine with 5 states having alphabet
size 5 exists; see [57]. In [23], D. Deutsch proved the existence of a universal
quantum Turing machine capable of simulating any other QTM with an arbitraty precision, but Deutsch did not consider the efficiency of the simulation.
Bernstein and Vazirani supplied in [9] a construction of a universal QTM,
which simulates any other Turing machine such that, given the description
of a QTM M, a natural number T, and E > 0, the universal machine can
simulate M with precision E for T steps in time polynomial in T and ~.
34
2. Devices for Computation
2.3 Circuits
2.3.1 Boolean Circuits
Recall that a Turing machine can be regarded as a facility for computing a
partially defined function f: A* -t A*. We will now fix A = 1F2 = {O, I} to be
the binary alphabet 11 and consider the Boolean circuits computing functions
{O, 1} n -t {O, I} m. As the basic elements of these circuits we choose functions
tI : IF~ -t 1F2 (the logical and-gate) defined by tI(XI,X2) = 1 if and only if
Xl = X2 = 1, V: IF~ -t 1F2 (the logical or-gate) defined as V(XI,X2) = if and
only if Xl = X2 = and the logical not-gate IF 2 -t IF 2 defined by ,x = 1 - x.
A Boolean circuit is an acyclic, directed graph whose nodes are labelled
either with input variables, output variables, or logical gates tI, V, and '. An
input variable node has no incoming arrows, while an output variable node
has no out coming arrows but exactly one incoming arrow. Nodes tI, V, and
, have 2, 2, and 1 incoming arrows respectively. In connection with Boolean
circuits, the arrows of the graph are also called wires. The number of the
nodes of a circuit is called the complexity of the Boolean circuit.
A Boolean circuit with n input variables Xl, ... , Xn and m output variables
YI, ... , Ym naturally defines a function IF;' -t 1F2': the input is encoded by
giving the input variables values and 1, and each gate computes a primitive
function tI, V or '. The value of the function is given in the sequence of the
output variables YI, ... , Ym·
°
°
°
Example 2.3.1. The Boolean circuit in Figure 2.1 computes the function f :
IF~ -t IF~ defined by f(O,O) = (0,0), f(O, 1) = (0,1), f(l,O) = (0,1), and
f(l, 1) = (1,0). Thus, f(XI,X2) = (YI,Y2), where YI is Xl +X2 modulo 2, and
Y2 is the carry bit.
x
r-------------~
Fig. 2.1. A Boolean circuit computing the sum of bits
Xl
and
X2
Emil Post has proven a powerful theorem characterizing the complete
sets of truth functions [53] which implies that all functions IF;' -t 1F2' can be
11
lF2 stands for the binary field with two elements 0 and 1, the addition and the
multiplication defined by 0 + 0 = 1 + 1 = 0, 0 + 1 = 1,0·0 = 0·1 = 0, 1·1 = 1.
Thus -1 = 1 and 1-1 = 1 in IF 2.
2.3 Circuits
35
computed by using Boolean circuits with gates /I., V, and -'. A set S of gates is
called universal if all functions IF2 ~ IF2 can be constructed by using the gates
in S. In what follows, we will demonstrate that S = {/I., V, -,} is a universal
set of gates. The fact that all functions IF2 ~ IFr are also computable follows
from the observation that one can design any function I : IF2 ~ IFr using m
functions which each compute a single bit.
First, it is easy to verify that V(XI' V(X2' X3)) equals to 0 if and only if
Xl = X2 = X3 = o. For short, we denote v(Xl. V(X2' X3)) = Xl V X2 V X3.
Clearly this generalizes to any number of variables: using only function V on
two variables, it is possible to construct a function Xl V X2 V ... V X n , which
takes value 0 if and only if all the variables are o. Similarly, using only /I.
on two variables, one can construct function Xl /I. X2 /I. ... /I. X n , which takes
value 1 if and only if all the variables are 1. We define a function Ma for each
a = (al, ... ,an) E IF2 by
where <Pi(Xi) = -'Xi, if ai = 0 and <Pi(Xi) = Xi, if ai = 1. Thus, functions Ma
can be constructed by using only /I. and -'. Moreover, Ma(xl. ... , xn) = 1 if
and only if each <Pi(Xi) = 1 for each i, but
so <Pi(Xi) = 1 if and only if Xi = ai. It follows that Ma is the characteristic function of singleton {a}: Ma(:Z:) = 1 if and only if:z: = a.
Using the characteristic functions Ma it is easy to build any function I:
I = Mal V Ma2 V ... V M ak , where al. a2, ... , ak are exactly the elements
of IF2 for which I takes a value of 1.
Boolean circuits thus provide a facility for computing functions IF2 ~ IFr,
where the number of input variables is fixed. Let us then consider a function
I: {O, 1}* ~ {O, I} defined on any binary string. Let In be the restriction of
Ion {O, l}n. For each n there is a Boolean circuit Cn with n input variables
and one output variable computing In, and we say that Co, C I , C 2 , C 3 , ... , is
a lamily 01 Boolean circuits computing I. A family of Boolean circuits having
only one output node can be also used to recognize languages over the binary
alphabet: we regard a binary string with length n as accepted if the output
of en is 1, otherwise that string is rejected.
So far we have demonstrated that, for an arbitrary function In : IF2 ~ lF2
there is a circuit computing In. Thus, for an arbitrary binary language L there
is a family Co, Cl. C 2 , •.• , accepting L. Does this violate Church-Turing's
thesis? There are only numerably many Turing machines, so there are only
numerably many binary languages that can be accepted by Turing machines.
But all binary languages (there are non-numerably many such languages) can
be accepted by Boolean circuit families. Is the notion of a circuit family a
stronger algorithmic device than a Turing machine? The answer is that the
36
2. Devices for Computation
notion of a circuit family is not an algorithmic device at all. The reason is
that the way in which we constructed the circuit Cn requires exact knowledge
of the values of fn. In other words, without knowing the values of f, we do
not know how to construct the circuit family Co, C l , C2, ... ; we can only
state that such a family exists but, without the construction, we cannot say
that the circuit family Co, C l , C2 , .•. is an algorithmic device.
We say that a language L has uniformly polynomial circuits if there exists
a Turing machine M that on input 1n (n consecutive 1 's) outputs the graph
of circuit C n using space O(logn), and the family Co, C l , C2 , ... accepts L.
For the proof of the following theorem, consult [48].
Theorem 2.3.1. A language L has uniformly polynomial circuits if and only
if L E P.
Since our main concern in this book is polynomial-time quantum computation, we will use quantum circuit formalism which is analogous to the uniformly polynomial circuits.
2.3.2 Reversible Circuits
In Section 2.2 we introduced unary and binary quantum gates as unitary
mappings. This is a generalization of the notion of reversible gates, so let us
now study some properties of reversible gates. A reversible gate on m bits is
a permutation on 1FT, so there are m inputs and output bits. Clearly there
are (2m)! reversible gates on m bits.
Example 2.3.2. Function T : 1Ft ~ 1Ft, T(Xl,X2,X3) = (Xl,X2,XlX2 - X3)
defines a reversible gate on three bits. This gate is called the Toffoli gate.
The Toffoli gate does not change bits Xl and X2, but computes the notoperation on X3 if and only if Xl = X2 = 1. The symbol in Figure 2.2 is used
to signify the Toffoli gate.
X2 - - - - i l f - - - - Y2
X3--~r----
Y3
Fig. 2.2. The Toffoli gate
A reversible circuit is a permutation on IFg composed of reversible gates. Thus,
we make no fundamental distinction between reversible gates and reversible
2.3 Circuits
37
circuits, but the only difference is contextual: we usually assume that there
is a fixed set of reversible gates, and an arbitrarily large reversible circuit can
be built from them.
We call a set R of reversible gates universal if any reversible circuit
C : lF -+ lF can be constructed using the gates in R, constants, and
some workspace. This means that, using the gates in R, we can construct
a permutation f : lF~+m -+ lF~+m such that there is a fixed constant vector
(CI,"" cm ) E lF2 such that
z
z
where (YI,"" Yn) = C(XI,"" xn). Moreover, in this construction we also
allow bit permutations that swap any two bits:
but these bit permutations could, of course, be avoided by requiring only
that f simulates C in such a way that the constants are scattered among
the variables in fixed places, and that the output can be read on fixed bit
positions. If a universal set R consists of a single gate, this gate is said to be
universal. When a universal set R of reversible gates is fixed, we again say
that the complexity of a circuit C (with respect to the set R) is the number
of the gates in C.
Earlier we demonstrated that a Boolean circuit constructed of irreversible
gates /\, V, and, can compute any function lF -+ lF2' which especially
implies that, by using these gates, we can build any reversible circuit, as
well. Recalling that any Turing machine can also be simulated by a reversible
Turing machine, it is no longer surprising that all Boolean circuits can be
simulated by using only reversible gates. In fact:
z
• Not-gates are already reversible, so we can use them directly.
• And-gates are irreversible, and therefore we simulate any such by using a
Toffoli gate T(XI,X2,X3) = (XI,X2,XIX2 -X3). Notice that T(XI,X2,0) =
(Xl, X2, XIX2), so, using the constant 0 on the third component, a Toffoli
gate computes the logical and of variables Xl and X2.
• Since Xl V X2 = ,(,Xl /\ ,X2), we can replace all the V-gates in a Boolean
circuit by /\- and ,-gates.
• The fanout (multiple wires leaving a gate) is simulated by a controlled
not-gate C : lF~ -+ lF~, which is a permutation defined by C(XI' X2) =
(Xl, Xl - X2), so the second bit is negated if and only if the first bit is one.
Again, C(XI,O) = (Xl, xd, so, by using the constant 0 we can duplicate
the first bit Xl.
Because of the above properties, we can construct a reversible circuit for each
Boolean circuit simulating the original one. This contruction uses not-gates,
Toffoli gates, controlled not-gates and the constant O. But using the constant
38
2. Devices for Computation
1, the Toffoli gates can also be used to replace the not- and controlled notgates: T(l, 1, Xl) = (1,1, ,Xl) and T(l, Xl, X2) = (1, Xl, Xl -X2), so the Toffoli
gates with constants 0 and 1 are sufficient to simulate any Boolean circuit
with gates 1\, V and '. Since Boolean circuits can be used to build up an
arbitrary function lF2 --t lF2', we have obtained the following theorem.
Theorem 2.3.2. A Toffoli gate is a universal reversible gate.
Remark 2.3.1. A more straightforward prooffor the above theorem was given
by Toffoli in [68]. It is interesting to note that there are universal two-qubit
gates for quantum computation, [25] but it can be shown that there are no
universal reversible two-bit gates. In fact, there are only 4! = 24 reversible
two-bit gates, and all of them are linear, i.e., they can all be expressed as
T(x) = Ax + b, where b E lF~ and A is an invertible matrix over the binary
field. Thus, any function composed of them is also linear, but there are also
nonlinear reversible gates, such as the Toffoli gate.
2.3.3 Quantum Circuits
We identify again the bit strings x E lF2' and an orthogonal basis {Ix) I
x E lF2'} of an 2m-dimensional Hilbert space H 2 m. To represent linear
mappings lF2' --t lF2', we adopt the coordinate representation Ix) = ei =
(0, ... , 1, ... ,of, where ei is a column vector having zeroes elsewhere but
1 in the ith position, if the components of x = (Xl,' .. ,xm ) form a binary
representation of number i + 1.
A reversible gate f on m bits is a permutation of lF2', so any reversible
gate also defines a linear mapping in H 2 m. There is a 2m X 2m permutation
matrix M(f) 12 representing this mapping, M(f)ij = 1, if f(ej) = ei, and
M(f)ij = 0 otherwise.
Example 2.3.3. In lFi we denote 1000) = (1,0, ... ,of, 1001) = (0,1, ... ,O)T,
... , 1111) = (0,0, ... , If. The matrix representation of the Toffoli gate is
M(T)
=
where 12 is the 2 x 2-identity matrix and
(Example 2.2.1).
M~
is the matrix of the not-gate
Quantum gates generally are introduced as straightforward generalizations of reversible gates.
12
A permutation matrix is a matrix having entries 0 and 1, exactly one 1 in each
row and column.
2.3 Circuits
39
Definition 2.3.1. A quantum gate on m qubits is a unitary mapping in
H2 0 ... 0 H2 (m times), which operates on a fixed number (independent of
m) of qubits.
Because M(f):j = 1 if and only if f(ei) = ej, M(f)* represents the inverse
permutation to f. Therefore, a permutation matrix is always unitary, and
reversible gates are special cases of quantum gates. The notion of a quantum
circuit is obtained from that of a reversible circuit by replacing the reversible
gates by quantum gates. The only difference between quantum gates and
quantum circuits here is again contextual: we require that a quantum circuit must be composed of quantum gates, which each operate on a bounded
number of qubits.
Definition 2.3.2. A quantum circuit on m qubits is a unitary mapping on
H 2 =, which can be represented as a concatenation of a finite set of quantum
quantum gates.
Since reversible circuits are also quantum circuits, we have already discovered the fact that whatever is computable by a Boolean circuit is also
computable by a quantum circuit. It is also interesting to compare the computational power of polynomial-size quantum circuits (that is, quantum circuits
containing polynomially many quantum gates) and polynomial-time QTMs.
A. Yao [73J has shown that their computational powers coincide.
3. Fast Factorization
In this chapter we represent Shor's quantum algorithm for factoring integers.
Shor's algorithm can be better understood after studying quantum Fourier
transforms. The issues related to Fourier transforms and other mathematical
details are handled in Chapter 8, but a reader having a solid mathematical
knowledge of these concepts is advised to ignore the references to Chapter 8.
3.1 Quantum Fourier Transform
3.1.1 General Framework
Let G = {gl, g2, ... , gn} be an abelian group (we will use the additive notations) and {Xl, X2, ... , Xn} the characters of G (see Section 8.2). The functions f : G -+ C form a complex vector space V, addition and scalar multiplication are defined pointwise. If It, 12 E V, then the standard inner product
(see Section 8.3) of It and 12 is defined by
n
(It I h) = L ft(gk)12(gk).
k=l
Any inner product induces a norm by 11111 = v'<JTl). In Section 8.2 it is
demonstrated that the functions Bi =
Xi form an orthonormal basis of
the vector space, so each I E V can be represented as
.in
where Ci are complex numbers called the Fourier coefficients of f. The discrete
Fourier transform of I E V is another function! E V defined by !(gi) = Ci.
Since the functions Bi form an orthonormal basis, we see easily that Ci =
(Bi I I), so
(3.1)
The Fourier transform satisfies Parseval's identity
42
3. Fast Factorization
11111 = II!II,
(3.2)
which will be important in the sequel.
Let H be a finite quantum system capable of representing the elements of
G. This means that {Ig) I 9 E G} is an orthonormal basis of some quantum
system H. To obtain the matrix representations for linear mappings, we use
the coordinate representation Igi) = ei = (0, ... ,1, ... , of (all coordinates 0
except 1 in the ith position; see Section 8.3). The general states of the system
are the unit-length linear combinations of the basis states. Thus, a general
state
(3.3)
of H can be seen as a mapping
! : G -+ C,
where !(gi)
= Ci
and
II!II =
1,
(3.4)
and vice versa, each mapping (3.4) defines a state of H.
Definition 3.1.1. The quantum Fourier transform (QFT) is the operation
n
n
L !(gi) Igi) L [(gi) Igi) .
H
i=l
(3.5)
i=l
In other words, QFT is just the ordinary Fourier transform of a function
! : G -+ C determined by the state (3.3) via (3.4). It is clear by the formula
(3.1) that the QFT (3.5) is linear, but by Parseval's identity (3.2), QFT is
even a unitary operation in H. As seen in Exercise 1, for basis vectors Igi)
the operation (3.5) becomes
(3.6)
and hence it is clear that in basis Igi), the matrix of the QFT is
(3.7)
What kind of quantum circuit is needed to implement (3.6)7 The problem
which arises is that, in a typical situation, n = IGI = dim(H) is large, but
usually H is a tensor product of smaller spaces. However, quantum circuit
operations were required to be local, not affecting a large number of quantum
digits at the same time. In other words, the problem is how we can decompose
the QFT matrix (3.7) into a tensor product of small matrices, or into a
3.1 Quantum Fourier Transform
43
product of few matrices that can be expressed as a tensor product of matrices
operating only on some small subspaces.
To approach this problem, let us assume that G = U EEl V is a direct sum
of the subgroups U and V. Let r = lUI and s = lVI, hence IGI = rs. Let
also U and V be represented by some quantum systems Hu and Hv with
orthonormal bases
respectively. Then, the tensor product Hu®Hv represents G in a very natural
way: each 9 E G can be uniquely expressed as 9 = U + v, where U E U and
v E V, so we represent 9 = U + v by Iu) Iv). Since we have a decomposition
G= fJ x if, all the characters of G can be written as
Xij(g)
where
and
= Xij(U + v) = xf (u)xj (v),
xf and xj are characters of U and V respectively (see Section 8.2.1),
(i,j)
E
{l, ... ,r} x {l, ... ,s}.
Thus, the Fourier transforms can also be decomposed:
Lemma 3.1.1 (Fourier transform decomposition). Let G = U EEl V be
a direct product of subgroups U and V and {Iu) Iv) I U E U, v E V} a quantum
representation of the elements of G. Then
lUi) IVj)
~ (~ t(Xf(Uk»* IUk») ()s t(xj(vz»* Ivz»)
k=l
1
=
r
(3.8)
Z=l
s
~ LLX:j(Uk +vz) IUk) Ivz)
yrs
(3.9)
k=lZ=l
is the Fourier transform on G.
The decomposition of the Fourier transform may be applied recursively to
groups U and V. It is also interesting to notice that the state (3.9) is decomposable.
3.1.2 Hadamard-Walsh Transform
Let us study a special case G =
are
of
IFr
IFr.
IFr. All the characters of the additive group
where y E
To implement the corresponding QFT we have to find a
quantum circuit that performs the operation
44
3. Fast Factorization
Ix)
f--+
L
(-I)"'·Y Iy) .
yEIF2'
The elements x = (Xl, X2,
sentation by m qubits:
... , Xm) E
1FT have a very natural quantum repre-
which corresponds exactly to the algebraic decomposition
Thus, it follows from Lemma 3.1.1 that it suffices to find QFT on 1F 2, and the
m-fold tensor product of that mapping will perform QFT on 1FT. However,
QFT on 1F2 with representation {IO), II)} is defined by
1
+ 11)),
10)
f--+
J2(10)
11)
f--+
J2(10) -11)),
1
which can be implemented by the Hadamard matrix (which is also denoted
by W 2 )
H
1 (11 -11) .
= J2
Thus, we have obtained the following result:
Lemma 3.1.2. Let Hm
= H®·· ·®H (m
times). Then for Ix)
= IXI)·· ·Ix m )
(3.10)
For any m, the matrix Hm is also called a Hadamard matrix. QFT (3.10)
is also called a Hadamard transform, Walsh transform, or Hadamard- Walsh
transform.
3.1.3 Quantum Fourier Transform in Zn
All the characters of group Zn are
27rI,xy
Xy(x) = e n ,
where x and yare some representatives of cosets (See Sections 8.1.3, 8.1.4,
and 8.2). To simplify the notations, we will denote any coset k+nZ by number
the k. Using these notations,
3.1 Quantum Fourier Transform
45
Zn = {0,1,2, ... ,n-l},
where addition is computed modulo n. This notation will be used hereafter.
The corresponding QFT on Zn is the operation
Ix) --+
L e- 2~~xy Iy) .
n-l
(3.11)
y=o
Now we have the problem that, unlike for lF2', there is no evident way to give
a quantum representation for the elements of Zn in such a way that the basis
10),11),12), ... , In -1)
could be represented as a tensor product of two smaller bases representing
some subsystems of Zn. If, however, we know some factorization n = nln2
such that gcd(nl, n2) = 1, the Chinese Remainder Theorem (Theorem 8.1.2)
offers us a decomposition: according to the Chinese Remainder Theorem,
there is a bijection
given by F((k l , k 2 )) = aln2kl + a2nlk2, where al (resp. a2) is the multiplicative inverse of n2 (resp. nl) modulo nl (resp. n2)' By Exercise 2, the
mapping F is, in fact, an isomorphism. We also notice that, since al (resp.
a2) has multiplicative inverse, the mapping kl f--7 a1k 1 (resp. k2 f--7 a2k2) is a
permutation of Zn! (resp. Zn2)' and this is all we need for decomposing the
QFT (3.11).
Assume that the QFTs are available for Zn! and Zn2' i.e., we have quantum representations
{IO), 11), ... , Inl - I)} and {IO), 11), ... , In2 -I)}
of Zn! and Zn2 and we also have the routines for mappings
We use the quantum representation Ik) = Ik l ) Ik 2) for the elements of Zn
given by the Chinese Remainder Theorem with k = aln2kl + a2nlk2. By
constructing a quantum circuit that performs the multiplications (k 1 , k2 ) f--7
(alk l , a2k2) and concatenating that with the QFT circuits on both components, we get
46
3. Fast Factorization
1
.jnl n 2
1
.jnln2
1
L L
n1-1 n2- 1
e_2"i(a1n21~~~~a2n112k2) Ill) Il2)
11 =0 12 =0
L L
n1-1 n2- 1
e_2.-iF(~;:'1/2k2) Ill) Il2)
11 =0 12 =0
n1-1 n2-1
"""""' """""'
~ ~ ~ e-
2.-iF(k 1 .k2)F(11
n1 n 2
ynl n 2 11 =0 12 =0
h)
IF(h, l2)).
This decomposition can be applied recursively to nI and n2.
Notice carefully that the previous decomposition requires some factorization n = n1n2 such that gcd(nl' n2) = 1, but generally the problem is to find
a nontrivial factorization of n. Shor's factorization algorithm makes use of
the inverse QFT on 2 2 =, and since there is no coprime factorization for 2m ,
we have to be ready to offer something else. Fortunately, we can also obtain
some more knowledge on QTFs on groups 2 2 =.
Since the inverse Fourier transform in 2n is quite symmetric to (3.11),
(e- 2"~xy is replaced with e 2"~XY), choosing which one to study is a matter of
personal preference. In what follows, we will learn how to implement the inverse Fourier transform on 2 2 =. Group 2 2 = has a very natural representation
by using m qubits: an element x = Xm_12m-1 + Xm_22m-2 + ... + X12 + xo,
where each Xi E {O, I}, is represented as
How do to implement the inverse QFT
2"'_1
Ix) H
~ ~ e 2;:,f YIy)
(3.12)
with a quantum circuit? To approach this problem, we can first make an
observation which should no longer be a surprise more after the previous
decomposition results: The superposition on the right-hand side of (3.12) is
decomposable.
Lemma 3.1.3.
L
2=-1
e 2;;,f Y Iy)
y=o
= (10) + e* 11))(10) + eW 11))··· (10) + e2;;.i.:! 11))
(3.13)
3.1 Quantum Fourier Transform
47
Proof. Representing Iy) = Iy'b) = Iy') Ib), where y' are the m - 1 most significant bits and b the least significant bit of y, we can divide the sum into
two parts:
y=o
=
y'=o
y'=o
2=-1_1
2=-1_1
L
e 2"2\~fY Iy) 10)
+
L
e 2"2i;''''Ve~ Iy) 11)
y=o
y=o
2=-1_1
=
L
e;;;::Y Iy) (10) +e 2 ;;.i: t 11)).
y=o
o
The claim now follows by induction.
Decomposition (3.13) also gives us a clue on how to compute transform (3.12)
by using a quantum circuit. The lth phase of 11) in (3.13) depends only on
bits xo, Xl, ... , Xl-1 and can be written as
7ri(2
exp (
m-
1x m _1
_
(7ri(21-1Xl_1
- exp
+ 2m - 2x m _2 + ... + 2X1 + xo))
21-
1
+ 21-2xl_2 + ... + 2X1 + xo))
21-
1
_
(7riXI_1)
(7riXI-2)
- exp ~ exp -2-1-
..
(7riX1)
( 7rixO)
·exp 21- 2 exp 21- 1
7rixO)
-_ (I)XI-1
exp (7riXI_2)
- - - ... exp (7riX1)
- - exp ( 21
2/- 2
21- 1
We can now describe a quantum circuit that performs operation (3.12): given
a quantum representation
(3.14)
we swap the qubits to get the reverse binary representation
(3.15)
In practice, this swapping is not necessary, since we could operate as well
with the inverse binary representations; we included the swapping here just
for the sake of completeness.
On state (3.15) we proceed from right to left as follows: Hadamard transform on the mth (the right-most) qubit gives
and we complete the phase (_I)x=-1 to
48
3. Fast Factorization
by using phase rotations
wi
wi
exp( 2 1 )' ... ,exp( 2m -
wi
2 )'
exp( 2m -
1)
conditionally. That is, for each l E {I, 2, ... ,m -I}, a phase factor exp( 2':~1)
is introduced to the mth bit if and only if mth and lth qubits are both 1.
This procedure will yield state
The same procedure will be applied from right to left to qubits at locations
m - 1, ... , 2, and 1: for a qubit at location l we first perform a Hadamard
transform to get the qubit in state
then, for each k in range l - 1, l - 2, ... , 1, we introduce phase factor
wi
exp (2 1_ k
)
conditionally if and only if the lth and kth qubit are both 1. That can be
achieved by mapping
10) 10) r-+ 10) 10)
10) 11) r-+ 10) 11)
11) 10) r-+ 11) 10)
11) 11) r-+ e~ 11) 11),
which acts on the lth and kth qubits. The matrix of the mapping can be
written as
cPk,1 =
(
100
010
001
0 )
0
~i
•
(3.16)
OOOe2I=
Ignoring the swapping (X m -1,X m -2, ... ,Xo) r-+ (xo,x1, ... ,xm -d, this procedure results in the network in Figure 3.1 with ~m(m + 1) gates.
In Figure 3.1, the subindices of the gates cP are omitted for typographical
reasons. The cP-gates are (from left to right) cPm-1,m-2, cPm-1,m-3, cPm-1,O,
cPm-2,m-3, cPm-2,O, and cPm-3,O.
3.1 Quantum Fourier Transform
Xm-l
49
~
X m -2
Yl
X m -3
Y2
Xo
Ym-l
Fig. 3.1. Quantum network for QFT on
Z2ffl
3.1.4 Complexity Remarks
By the traditional discrete Fourier transform one usually understands the
Fourier transform in Zn given by
1(x) =
n-l
L
e- 2"!XY f(y)·
(3.17)
y=O
For practical reasons, n is typically chosen as n = 2m . Fourier transform
(3.17) can be used to approximate the continuous Fourier transform, and has
therefore tremendous importance in physics and engineering.
By computing (3.17), we understand that we are given vector
(1(0), f(1), ... , f(2 m
-
1))
(3.18)
as the input data, and the required output is
(1(0),1(1), ... ,1(2m
-
1)).
(3.19)
The naive method for computing the Fourier transform is to use formula
(3.17) straightforwardly to find all elements in (3.19), and the time complexity
given by this method is O((2m)2), as one can easily see.
A significant improvement is obtained by using fast Fourier transform
(FFT), whose core can be expressed in a decomposition
50
3. Fast Factorization
which very closely resembles the decomposition of Lemma 3.1.3, and essentially states that the vector (3.19) can be computed by combining two Fourier
transforms in Z2",-1. The time complexity obtained by recursively applying the above decomposition is O(m2m), significantly better than the naive
method.
The problem of computing QFT is quite different: in a very typical situation we have, instead of (3.18), a quantum superposition
(3.20)
and QFT operates on the coefficients of (3.20). At the same time, the physical
representation size of (3.20) is small; in system (3.20) there are only m qubits,
yet there are 2m coefficients Ci. Earlier we learned that QFT in Z2'" can be
done in time O(m 2 ) (Hadamard-Walsh transform can be done even in time
o (m)), which is exponentially separate from the classical counterparts of
Fourier transform. But the major difference is that the physical representation
sizes of (3.18) and (3.20) are also exponentially separate, the first one taking
!t(2m) bits and the latter one m qubits. Later, we will learn that the key
idea behind many interesting fast quantum algorithms is the use of quantum
parallelism to convey some information of interest into the coefficients of
(3.20) and then compute QFT rapidly.
3.2 Shor's Algorithm for Factoring Numbers
3.2.1 From Periods to Factoring
Given two prime numbers p and q, it is an easy task to compute the product n = pq. The naive algorithm already has quadratic time complexity
O(max{JpJ ,JqJ}2), but, by using more sophisticated methods, even performance O(max{JpJ ,JqJ}l+f) is reachable for any f > 0 (see [20]). On the other
hand, the inverse problem, factorization, seems to be extremely hard to solve.
Nowadays it is a widespread belief that there is no efficient classical algorithm for solving the factorization problem: given n = pq, a product of two
primes, find out p and q. This problem has great importance in cryptography:
the reliability of the famous public-key cryptosystem RSA is based on the
assumption that no efficient algorithm exists for factorization.
In 1994, Peter W. Shor [63] introduced a probabilistic polynomial-time
quantum algorithm for factorization, which will be presented in this chapter.
First we show how to reduce the factoring to finding orders of elements in
Zn. Let
(3.21)
be the (unknown) prime factorization of an odd number (the powers of 2 can
be easily recognized and canceled). We will also assume that k ;::: 2, i.e., n
3.2 Shor's Algorithm for Factoring Numbers
51
has at least two different prime factors. In fact, there is a polynomial-time
algorithm for recognizing powers (see Exercise 3), so we may as well assume
that n is not a prime power. We can be satisfied if we can find any non-trivial
factor of n rapidly, since the process can be recursively applied to all of the
factors in order to uncover the whole factorization (3.21) swiftly.
Let us randomly choose, with uniform probability, an element a E :In,
a i=- 1. If d = gcd(a, n) > 1, then d is a nontrivial factor of n and our task
is complete. If d = 1, assume that we could somehow extract r = ordn(a)
(ordn(a) means the order of element a in :In, see Sections 8.1.3 and 8.1.4 for
the number-theoretic notions in this section). Then
ar == 1 (mod n),
which means that n divides a r - 1. Of course this does not yet offer us a
method for extracting a nontrivial factor of n, but if r is even, we can easily
factorize a r - 1:
ar
-
1 = (a~ - l)(a~
+ 1).
Now, since n divides a r -1, n must share a factor with a~ -lor with a~ + 1
(or with both), and this factor can be extracted by using Euclid's algorithm.
How could we guarantee that the factor of n obtained in this way is
nontrivial? Namely, if n divides a~ ± 1, then Euclid's algorithm would only
give n as the greatest common divisor. Fortunately, it is not so likely that n
divides a~ ± 1, as we will soon demostrate. First, n I (a~ - 1) would imply
that
a¥ == 1 (mod n),
which would be absurd by the very definition of order. But it may still happen
that n just divides d + 1, and does not share any factor with a~ -1. In this
case, the factor given by Euclid's algorithm would also be n. On the other
hand, n I a ~ + 1 means that
a¥ == -1
(mod n),
i.e., a~ is a nontrivial square root of 1 modulo n. In the next section we will
use elementary number theory to show that, for a randomly chosen (with
uniform distribution) element a E :l~, the probability that r = ord(a) is
even, and that a ~ ¢. -1 (mod n) is at least
Consequently, assuming
that r = ord(a) could be rapidly extracted, we could, with a reasonable
probability, find a nontrivial factor of n.
There may be still some concern about the magnitude of numbers a ~ ± 1. Is
it possible that these numbers would be so large that the procedure could be
inefficient anyway? Fortunately, it is easy to answer this question: divisibility
by n is a periodic property with period n; so, once r = ord(a) is found, it
;6.
52
3. Fast Factorization
suffices to find a ~ ± 1 modulo n (for modular exponentiation, rapid algorithms
are known).
We may now focus on the problem of finding r = ord(a). Since n is odd
but not a prime power, group Z~ is never cyclic, but each element a E Z~
has an order r = ord(a) smaller than ~'P(n) < ~. By the definition of the
order,
for any integer s, which implies that the function
= ak
f(k)
(mod n)
f : Z ---+ Zn
defined by
(3.22)
has r as the period. We will demonstrate in Section 3.2.3 that it is possible
to find this period rapidly by using QFT.
Example 3.2.1. 15 = 3·5 is the smallest number that can be factorized by
the above method.
Zi5
= {1,2,4, 7,8,11, 13, 14},
and the orders of the elements are 0, 4, 2, 4, 4, 2, 4, 2 respectively. The
only element for which d == -1 (mod 15) is a = 14 == -1 (mod 15). If,
for example, a = 7, then 15 I 74 - 1 and 7 2 - 1 == 3 (mod 15), 72 + 1 == 5
(mod 15). We get 3 = gcd(3, 15) and 5 = gcd(5, 15) as nontrivial factors of
15.
3.2.2 Orders of the Elements in Zn
Let n = p~' ... p~k be the prime factorization of an odd n, and a E Z~ a
randomly (and uniformly) chosen element. The Chinese Remainder Theorem
gives us a useful decomposition
Z*n
=
Z*"
x ... x Z*Pk'k
P,
(3.23)
of Z~ as a direct product of cyclic groups. Recall that the cardinality of each
I
factor is given by IZp~i = 'P(p~') = p~i-l(pi - 1) (see Section 8.1.4). Note
especially that each Zp~i has an even cardinality. For the decomposition of
an element a E Z~ we will use notation
(3.24)
and the order of an element a in Z;, will be denoted by ord p ' (a).
Because of the decomposition (3.24), to choose a random element a E Z~
is to choose k random elements as in (3.24) and vice versa. The following
technical lemma will be quite useful for the probability estimations.
3.2 Shor's Algorithm for Factoring Numbers
53
Lemma 3.2.1. Let c,o(pe) = 2uv, where u 2: 1, 2 f v and s 2: 0 a fixed integer.
Then the probability that a randomly (and uniformly) chosen element a E
has an order of form 2S t with 2 f t is at most!.
Z;.
Proof. If s > u, then the probability in question is 0, so we may assume
s ~ u. Let g be a generator of
Then
= {gO,gl, ... ,g2 v-I}, and
Z;..
Z;.
U
.
2"v
ord(gJ) = gcd(j, 2uv) ,
so an order of form 2S t occurs if and only if j = 2u - s w, where 2 f w. In the set
{O, 1,2, ... ,2uv -1} there are exactly 2Sv multiples of 2u- s , namely o· 2u- s ,
1· 2u- s , ... , and (2 Sv -1)2"-S, but only half of them have an odd multiplier.
Therefore, the required probability is
S = __
12 s <_1
2U v
2 2u - 2'
! _
·2_
v
_2
o
as claimed.
We can use the previous lemma to estimate the probability of having an odd
order.
Lemma 3.2.2. The probability that r = ord( a) is odd for a uniformly chosen
a E Z~ is at most f,..
Proof. Let (al, ... , ak) be the decomposition (3.24) of an element a E Z~.
Let ri = ordp :' (ai). Since r = lcm{rl" .. ,rk} (Exercise 4), r is odd if and
only if each ri is odd.
Putting s = 0 in Lemma 3.2.1, we learn that the probability of having a
Therefore, the probability PI
random ai E Zp;' with odd order is at most
of having odd r is at most
!.
o
as claimed.
What about the probability that a~ == -1 (mod n)? It turns out that
Lemma 3.2.1 is useful for estimating that probability, too.
Lemma 3.2.3. Let n = p~l ... p~k be the prime decomposition of an odd n
and k 2: 2. If r = ordn (a) is even, then the probability that a ~ == -1 (mod n)
is at most f,..
Proof. Congruence a;:
a~
== -1 (mod n) implies that
== -1 (mod p:i)
(3.25)
54
3. Fast Factorization
for each i. Let ri = ordp~i (ai), so r = lcm{r1"'" rd. We write r = 2S t and
ri = 2S 't i , where 2 f t and 2 f t i . From the fact that ri I r for each i, it follows
that Si ~ S, but the congruencies (3.25) can only hold if Si = S for every i.
For if Si < S for some i, then also ri I ~ (the assumption k 2: 2 is needed
here!), which implies that
a~
== 1 (mod
p~i).
(3.26)
But (3.26) together with (3.25) gives 1 == -1 (mod p~i), which is absurd since
Pi -=I- 2.
Therefore, the probability P2 that a¥ == -1 (mod n) is at most the probability that Si = S for each i, which is
P2 ~
1
1
2'" 2 =
1
2k
o
by Lemma 3.2.1.
Combining Lemmata 3.2.2 and 3.2.3, we get the following lemma.
Lemma 3.2.4. Let n = p~l ... p~k be the prime factorization of an odd n
with k 2: 2. Then, for a random a E Z~ (chosen uniformly), the probability
that r = ordn(a) is even and a~ t= -1 (mod n) is at least (1 - :j.)2 2: 196'
3.2.3 Finding the Period
By Lemma 3.2.4 we know that, for a randomly (and uniformly) chosen a E Z~,
the probability that r = ordn (a) is even and a ~
-1 is at least 196 for any
odd n having at least two distinct prime factors. It was already discussed in
Section 3.2.1 that the knowledge about the order of such an element a allows
us to find efficiently a nontrivial factor of n. Thus, a procedure for computing the order would provide an efficient probabilistic method for finding the
factors. Therefore, we have a good reason to believe that finding r = ord n (a)
is computationally a very difficult task.
It is clear that finding r = ordn (a) reduces to finding the period of function
f : Z -t Zn defined by
t=
f(k) = a k
(mod n).
(3.27)
In fact, if the period of f is r, then aT = aD == 1 (mod n).
It is a well-known fact that Fourier transform can be used for extracting information about the periodicity (see Section 8.2.5), and we will show
that, by using QFT, the period of (3.27) can be found with non-negligible
probability.
Of course, we cannot compute the QFT of (3.27) on whole Z, so we choose
a domain Zm = {O, I, ... , m - I} with m > n large enough to give room for
the period to appear. We will also choose m = 21 for two reasons: the first is
3.2 Shor's Algorithm for Factoring Numbers
55
that then there is a natural quantum representation of elements of Zm by l
qubits; and the second is that we have already learned in Section 3.1.3 how
to compute the QFT in Z21. We will fix this l later.
The procedure requires quantum representation of Zm and Zn (the latter
can be replaced by some set including Zn). The first takes l qubits and the
other takes at most l qubits. For the representation, notation Ix) Iy), where
x E Zm, Y E Zn, will be used.
Quantum Algorithm for Finding the Orders
1. The first stage is to prepare a superposition
Jm L Ik) 10)
m-l
(3.28)
k=O
by beginning with 10) 10) and using Hadamard-Walsh transform in Zm.
2. The second step is to compute k H a k (mod n) to get
(3.29)
Since the function k
as
H
a k (mod n) has period
T,
(3.29) can be written
(3.30)
where SI is the greatest integer for which SIT + l < m. It is clear that
cannot vary very much: we always have !!!
- 1 - i < SI < !!! - i.
T
r r
r
3. Compute the inverse QFT on Zm to get
1 ~ ~ 1 ~l
rm
rm
-~~-~e
=~
1=0 q=O
LLe
r-l m-l
1=0 p=O
2wip(qr+l)
'"
SI
I I)
Ip) a
p=O
L e 2W~!rq Ip) la
s
2w';'pl
l) .
(3.31 )
q=O
4. Observe the quantum representation of Zm in superposition (3.31) to get
some p E Zm.
5. Find the convergents f. of the continuous fraction expansion (see Section
8.4.2) of;; using Euclid's algorithm and output the smallest qi such that
a qi == 1 (mod n), if such qi exists.
In the next section we will estimate the correctness probability of this method.
Example 3.2.2. Let n = 15 and a
found. Let us choose m = 16.
= 7 be the element whose period is to be
56
3. Fast Factorization
1. The first step is to prepare
15
~L
Ik) 10).
k=O
2. Computation of k
f--t
7 k (mod 15) gives
~(IO) 11) + 11) 17) + 12) 14) + 13) 113) + 14) 11) + ... + 115) 113))
=
~ ( ( 10) + 14) + 18) + 112) ) 11)
+( 11) + 15) + 19) + 113) ) 17)
+( 12) + 16) + 110) + 114) ) 14)
+( 13) + 17) + 111) + 115) ) 113) ).
3. The inverse QFT in
Z16
yields
~ (( 10) + 14) + 18) + 112) ) 11)
+( 10) + i 14) -18) - i 112) ) 17)
+( 10) -14) + 18) -112) ) 14)
+( 10) - i 14) -18) + i 112) ) 113) ).
4. The probabilities to observe elements 0, 4, 8, and 12 are ~ for each.
5. The only convergent of 1~ is just
which does not give the period. The
convergents of 1~ are ¥ and ~, and the latter one gives the correct period
4. Observation of 8 does not give the period either, since the convergents
of 186 are and
but 12 gives, the convergents of ~~ are
1, and ~.
¥,
!,
¥
¥,
3.3 The Correctness Probability
3.3.1 The Easy Case
The probability of seeing a particular p E Zm when observing (3.31) is
p (P) =
L
r-1
1=0
-
1
m2
~
e '"
Le
Sl
q=O
2"ipqr
'"
12
L 1L e
1 r=-
m2
1
Sl
2"ipqr
'"
12
(3.32)
1=0 q=O
We will show that, with a non-negligible probability, the value p received
observing (3.31) will give us the period r.
To get some insight into how to evaluate the probability (3.32), we will
first assume that r happens to divide m. Recalling that SI is the greatest
3.3 The Correctness Probability
integer such that Sir + 1 < m, and that 0 ~ 1 <
each l. In this case, (3.32) becomes
pep) =
r
m2
r,
we see that
Sl = ~
57
-1 for
2
m/r-l
e tn/r
'""'~
~
(3.33)
q=O
The sum in (3.33) runs over all of the characters of 7I.. m/ r evaluated at p. Thus
(see Section 8.2.1),
m/r-l
L
e~
'ffi/r
q=O
-
-
{
!!:!,
r
0
. .
If P = 0 m 7I..m/r
otherwise,
and therefore
P( ) = { ~, if p = ~ . ~ for some d
p
0, otherwIse.
Thus, in the case rim, observation of (3.31) can give only some p in the
set {O· ~, 1 . ~, ... , (r - 1) . ~}, any such with a probability of ~. Now
that we know m and have learned p = d· ~ by observing (3.31), we may
try to find r by canceling;;' = ~ into an irreducible fraction using Euclid's
algorithm. Unfortunately, this works for certain only if gcd(d, r) = 1; in case
gcd(d, r) > 1 we will just get a factor of r as the nominator of ;;. = ~.
Fortunately, we can show, by using the number-theoretical results, that the
probability of having gcd(d, r) = 1 does not converge to zero too fast.
Example 3.3.1. In Example 3.2.2, the period r = 4 divided m = 16, and the
only elements that could be observed were 0, 4, 8, 12, the multiples of 4 = ~6.
However, 0 and 8 did not give the period, since the multipliers 0 and 2 share
a factor with 4.
3.3.2 The General Case
Remembering that m = 21, it is clear that we cannot always be lucky enough
to have a period r dividing m. We can, however, ask what is the probability
of observing a p which is close to a multiple of ~. For if
is small enough and gcd(d, r) = 1, then ~ is a convergent of ;;., and all the
convergents of ;;. can be found efficiently by using Euclid's algorithm. (see
Section 8.4.2).
We will now fix m in such a way that the continued fraction method will
apply. For any integer d E {O, 1, ... , r - I}, there is always a unique integer
p such that the inequality
58
3. Fast Factorization
1
m
1
-- <p-d- <2
r - 2
(3.34)
holds. Now choosing m such that n 2 ~ m
< 2n2 guarantees that
p
i-l<l- < -1
- -d
<
Im
r - 2m - 2n 2
2r2'
which implies by Theorem 8.4.3 that if gcd(d, r) = 1, then ~ is a convergent
to .E...
m
But there are only r integers p in Zm that satisfy (3.34), one for each
d E {O, 1, ... ,r - I}. What is the probability of observing one? In other
words, we should estimate the probability of observing p which satisfies
r
Ipr-dml ~"2
(3.35)
for some d.
Lemma 3.3.1. For n 2: 100, the observation of {3.31} will give apE Zm
such that Ipr - dml ~ ; with a probability of not less than ~.
Proof. Evaluating (3.32) (Exercise 5) we get
for any fixed d by the periodicity of sin 2 . Since we now assume that x
pr - dm takes values in [-i, iJ, we will estimate
°
It is not difficult to show that f(x) is an even function, taking the maximum
(Sl + 1)2 at x =
and minima in [-i, il at the end points ±i. Therefore,
2
f( ) > sin ~(Sl + 1)
x_
.2',T
SIn
Since
SI
2m
is the greatest integer such that sir
7r
r
7rr
-(1 - -) < - ( S I
2
m
2m
and
7r
r
+ 1) < -(1
+ -),
2
m
+ 1 < m,
we have
3.3 The Correctness Probability
f(x) 2:
sin2 1L(1 - L)
? 2 7rr m
SIn
59
(3.36)
2m
By the choice of m 2: n 2 , :;;. is negligible, and we can use estimations sin x ::; x
and sin 2 ~(1 +x) 2: 1- (~x)2 (Exercise 6) to get
f(x) 2:
:2(~f(l- (~:)2),
(3.37)
which implies that
4 -1 ( 1- (--)
71" r 2)
P(p) 2: .
71"2 r
2m
(3.38)
Factor
71" r
1- (--)
2m
2
(3.39)
in (3.38) tends to 1 as :;;. -t 0, and for n 2: 100, (3.39) is already greater than
0.9999. Thus, the probability of observing a fixed p such that -~ < p-dr; ::;
~ is at least ~: for any n 2: 100.
But there are exactly r such values p E Zm that -~ < p - dr; ::; ~:
namely, the nearest integers to dr; for each d E {O, 1, ... , r - I}. Therefore,
the probability of seeing some is at least ~ if n 2: 100.
0
We have learned that an individual p E Zm such that
1
m
1
-- <p-d- <2
r - 2
(3.40)
ir'
is observed with a probability of at least
Therefore, any corresponding
d E {O, 1, 2, ... ,r - I} is also obtained with a probability of at least
We
should finally estimate what the probability is that for such d also gcd( d, r) =
1 holds.
The probability that gcd(d, r) = 1 for a given element d E {O, 1, ... ,r-l}
is clearly 'f'~) , which can be estimated by using the following result.
ir'
Theorem 3.3.1 ([59]). For r 2: 3,
r
'Y
2.50637
cp (
r ) < e log log r + 1og 1ogr '
where 'Y = 0.5772156649 ... is the Euler's constant. I
Lemma 3.3.2. For r 2: 19, the probability that, for a uniformly chosen d E
{O,l, ... ,r-l} gcd(d,r) = 1 holds, is at least 41og\ogn'
1
i
Euler's constant is defined by I = limn-+oo(1 + ~ + + ... ~ -logn}. In [59J it
is even shown that in the above theorem, 2.50637 can be replaced with 2.5 for
all numbers r but r = 2 . 3 . 5 . 7 . 11 . 13 . 17· 19 . 23.
60
3. Fast Factorization
Proof. It directly follows from Theorem 3.3.1 that for r
r
2': 19,
< 4 log log r.
<p(r)
Therefore
<p(r)
1
r
4 log log r
-- >
1
> ...,.-:-----:--
4 log log n '
which was claimed.
D
We combine the following facts to get Lemma (3.3.3):
• The probability that, for a randomly (and uniformly) chosen a E :En, the
order r = ord n (a) is even and d :f:. -1 (mod n) is at least
(Lemma
3.2.4).
dr;!' < ~
• The probability that observing (3.31) will give a p such that
is at least ~ (Lemma 3.3.1).
• The probability that gcd(d,r) = 1 is at least 41og\ogn (Lemma 3.3.2).
t6
Ip -
I
Lemma 3.3.3. The probability that the quantum algorithm finds the order
of an element of :En is at least
9
1
160 log log n·
Remark 3.3.1. It was already mentioned in Section 3.1.4 that many interesting quantum algorithms are based on conveying some information of interest
into the coefficients of quantum superposition and then applying fast QFT.
The period finding in Shor's factoring algorithm is also based on this method,
as we can see: to use quantum parallelism, it prepared a superposition
1
m-l
Vm {; Ik) 10) .
The information of interest, namely the period, was moved into the coefficients by computing k H ak (mod n) to get
In the above superposition, each
L
Sl
q=O
Iqr
+ l)
lal ) has
(3.41)
3.3 The Correctness Probability
61
as the left multiplier. But in (3.41), a superposition of basis vectors Ix),
x E Zm, already contains information about the period in the coefficients:
coefficient of Ix) is nonzero if and only if x = qr+l, so the coefficient sequence
of (3.41) also has period r, and this period is to be extracted by using QFT.
Notice that the elements a l are distinct for l E {O, 1, ... , r -I}, and hence
it is guaranteed that the left multipliers (3.41) do not interfere. Thus, we can
say that the computation k f-t a k was the operative step in conveying the
periodicity to the superposition coefficients.
3.3.3 The Complexity of Shor's Factorization Algorithm
We can summarize the previous sections in the following algorithm:
Shor's quantum factorization algorithm
Input: An odd number n that has at least two distinct prime factors.
Output: A nontrivial factor of n.
1. Choose an arbitrary a E {I, 2, ... n - I}
2. Compute d = gcd(a, n) by using Euclid's algorithm. If d > 1, output d
and stop.
3. Compute number r by using the quantum algorithm in Section 3.2.3
(m = 21 is chosen such that n2 ~ n < 2n 2). If ar ¢. 1 (mod n) or r is
odd or ai == -1 (mod n), output "failure" and stop.
4. Compute d± = gcd(n, d ± 1) by using Euclid's algorithm. Output numbers d± and stop.
Step 2 can be done in time O(f(n)3).2 For step 3 we first check whether
a has order less than 19, which can be done in time O(f(n)3). Then
we use Hadamard-Walsh transform in Zm, which can be done in time
O(f(m)) = O(f(n)). After that, computing ak (mod n) can be done in time
O(f(m)f(n)2) = O(f(n)3). QFT in Zm will be done in O(f(m)2) steps, and
finally the computation of the convergents can be done in time O(f(n)3). Step
4 can also be done in time O(f(n)3).
The overall complexity of the above algorithm is therefore O(f(n)3),
but the success probability is only guaranteed to be at least Q( log l~g n) =
QCog~(n)). Thus, by running the above algorithm O(log(f(n))) times, we obtain a method that extracts a nontrivial factor of n in time O( f( n )3 log f( n))
with high probability.
2
Recall that the notation i(n) stands for the length of number n; that is, the
number of the digits needed to represent n. This notation, of course, depends on
a particular number system chosen, but in different systems (excluding the unary
system) the length differs only by a multiplicative constant, and this difference
can be embedded into the O-notation.
62
3. Fast Factorization
3.4 Excercises
1. Let G be an abelian group. Find the Fourier transform of
defined by
1()
g
=
1 : G --+
C
{ 1, if g = gi,
0, otherwise.
If 1 is viewed as a superposition (see connection between formulae (3.3)
and (3.4)), which is the state of H corresponding to 1?
2. Let n = nln2, where gcd(nl, n2) = 1. Let also
be the function given by F((k l , k 2)) = aln2kl + a2nlk2, where al (resp.
a2) given by the Chinese Remainder Theorem, is the multiplicative inverse of n2 (resp. nd modulo nl (resp. n2)' Show that F is an isomorphism.
3. a) Let nand k be fixed natural numbers. Device a polynomial-time algorithm that decides whether n = xk for some integer x.
b)Based on the above algorithm, device a polynomial-time algorithm
which tells whether a given natural number n is a nontrivial power of
another number.
4. Let n = p1 1 ••• p~k be the prime factorization of n and a E Z~. Let
(al"'" ak) be the decomposition of a given by the Chinese Remainder
Theorem and Ti = ordp;i (ai). Show that ordn(a) = lcm{TI, ... , Tk}.
2 = 4sin2~.
5. a) Prove that le ix
b) Prove that
_11
I te~12
q=O
sin2
71'pr~+l)
sin 2 7!E!:
m
6. Use Taylor series to show that sin 2 ~(1
+ x)
;::: 1- Gx)2.
4. Finding the Hidden Subgroup
In this brief chapter we present a quantum algorithm, which can be seen as
a generalization of Shor's algorithm. We can here explain, in a more structural manner, why quantum computers can speed up some computational
problems. The so-called Simon's hidden subgroup problem can be stated in
a general form as follows [12J:
Input: A finite abelian group G and function p : G -+ R, where R is a finite
set.
Promise: There exists a nontrivial subgroup H
and distinct on each coset of H.
~
G such that p is constant
Output: A generating set for H.
The function p is said to fulfill Simon's promise with respect to subgroup
H. If h E H, then elements 9 and 9 + h belong to the same coset of H (see
Section 8.1 for details), and since p is constant on the cosets of H, we have
that p(g + h) = p(g). We also say that pis H-periodic.
In Section 4.2 we will see that some interesting computation problems can
be reduced to solving the hidden subgroup problem. In a typical example of
this problem, IGI is so large that an exhaustive search for the generators of
H would be hopeless, but the representation size 1 of an individual element
9 EGis only 8(log IGI). Moreover, G can be typically described by a small
number of generators. Here we will consider the description size of G, which
is usually of order (log IGl)k for some fixed k as the size of the input. It is
also usually assumed that function p : G -+ R can be computed rapidly (in
polynomial time) with respect to log IGI.
For better illustration we will present the generalized form of Simon's
quantum algorithm on 1F2' for solving the hidden subgroup problem. Note
also that in group 1F2' any subgroup is a subspace as well, so instead of
asking for a generating set, we could ask for a basis. By using the GaussJordan elimination method one can efficiently find a basis when a generating
set is known (see Exercise 4).
1
The elements of G can be represented as binary strings, for instance.
64
4. Finding the Hidden Subgroup
4.1 Generalized Simon's Algorithm
In this section we will show how to solve Simon's subgroup problem on G
1F2', the addivite group of m-dimensional vector space over binary field IF 2
{O, I}.
=
=
4.1.1 Preliminaries
The results represented here will also apply if IF 2 is replaced with any finite
field IF, so we will, for a short moment, describe them in a more general form.
More mathematical background can be found in Section 8.3.
Here we temporarily use another notation for the inner product: if x =
(Xl, ... , xm) and y = (YI, ... , Ym) are elements of IFm, we denote their inner
product by
x . y = XIYI
+ ... + XmYm·
An element y is said to be orthogonal to H if y . h = 0 for each h E H.
It is easy to verify that elements orthogonal to H form a subgroup (even a
subspace) of IFm, which is denoted by H1.. The importance of the following
simple lemma will become clear in a moment.
Lemma 4.1.1. For any y E IFm,
' " (_l)h. Y
~
hEH
= {IHI, ify
0,
E H1.,
(4.1)
otherwzse.
Proof. This is analogous to proving the orthogonality of characters (Section
8.2.2). If y E H1., then h . y = 0 for each h E H, and the claim is obvious.
If y rf. H 1., then there exists an element hI E H such that hI . y i= 0, and
therefore
S
L ( - l t = L (_l)(h+h,).y
hEH
hEH
Y
= (_1)h L (_l)h· Y = (_1)h y S,
=
Y
1·
1•
hEH
hence S
= o.
o
Let H E IFm be a subspace having dimension d. If X is any generating
subset of H, we can, by Exercise (4), efficiently find a basis of H. Any d x m
matrix M over IF, whose rows are a basis of H, is called a generator matrix of
H. The name "generator matrix" is well justified by Exercise (1). It follows
from Exercises (2), (3), (4), and (5) that, once a generator matrix of H is
given, we can efficiently compute a generator matrix of H 1.. Moreover, since
H = (H 1. ) 1. (verify), it follows that in order to find a generating set for H,
it suffices to find a generating set for H 1. .
4.1 Generalized Simon's Algorithm
65
For a subgroup H of lFm, let T be some set that consists of exactly one element from each coset (see Section 8.1.2) of H. Such a set is called a transversal
of H in lFm and it is clear that ITI = [lFm : H] =
= 2;; . It is also easy to
verify that, since lFm = H ffi Hl., the equation IlFml = IHI·IHl.1 must hold.
1lF;'
4.1.2 The Algorithms
We will use m qubits to represent the elements of lFr and approximately
log2[lFr : H] = log2i;;, = m -log2lHI qubits for representing the values of
the function p : IF 2 --t R. Here the description size of G = lFr is only m, and
we assume that p is computable in polynomial time with respect to the input
size.
Using only function p, we can find a set of generators for H. It can be
shown [12] that if no other knowledge on p is given, Le., p is given as a
blackbox function (see Section 5.1.2), solving this problem using any classical
algorithm requires exponential time, even when allowing a bounded error
probability. On the other hand, we will now demonstrate that this problem
can be solved in polynomial time by using a quantum circuit.
The algorithm for finding a basis of H consists of finding a basis Y of H l. ,
then computing the basis of H. As mentioned earlier, the latter stage can be
efficiently performed by using a classical computer. The problem of finding a
basis for H 1. would be easier if we knew the dimension of H 1. in advance, as
we will see. In that case, the algorithm could be described as follows.
Algorithm A:
Finding the Basis (Dimension Known)
1. If d = dim H 1. = 0, output 0 and stop.
2. Use Algorithm B below to choose the set Y of d elements of H 1. uniformly.
3. Use the Gauss-Jordan elimination method (Exercise 4) to check whether
Y is an independent set. If Y is independent, output Y and stop; otherwise, give "failure" as output.
In a moment, we will see that the second step of the above algorithm will
give a basis of H1. with a probability of at least ~ (Lemma 4.1.2). Before that,
we will describe how to perform that step rapidly by using a quantum circuit.
It is worth mentioning that, in the above algorithm, a quantum computer is
needed only to produce a set of elements of H l..
Algorithm B:
Choosing Elements Uniformly
1. Using the Hadamard-Walsh transform on lFr, prepare
1
tnm
y2···
L
"'ElF;"
Ix) 10).
(4.2)
66
4. Finding the Hidden Subgroup
2. Compute p to get
1
v
rnm
2.. ·
L
"'EIF2'
Ix) Ip(x)) =
1
v
rnm
L L It + x) Ip(t)).
2.. · tET "'EH
3. Use the Hadamard-Walsh transform on
_1_
LL
_1_
(4.3)
1FT to get
L (-l)(t+"')·Y IY) Ip(t))
..j2ffi tET "'EH ..j2ffi yEIF2'
=
2~
= I~I
L L (-l?'Y L (-l)""Y IY) Ip(t))
tET yEIF2'
"'EH
L L
(_l)t. y IY) Ip(t)).
tETYEH.L
4. Make an observation to get an element Y E H 1- .
In the step 3, the equality between the the last and the second-last formulae follows from Lemma 4.1.1. It is easy to see that the probability of
observing a particular Y E H 1- is
P( ) = "'"' (l!fl(_1)t. y )2 = T IHI2 = I1FTI IHI2 = _1_
Y
~ 2m
22m
IHI l1Fm2 l2
IH 1-1 '
tET
so Algorithm B works correctly: elements of H 1- are drawn uniformly. It is
also clear that the above quantum algorithm runs in polynomial time.
For the second step of Algorithm A, we will now analyze the probability
that when choosing d elements uniformly from a d-dimensional subspace, the
chosen vectors are indpendent. The analysis is straightforward as seen in the
following lemma.
Lemma 4.1.2. Let t ::; d and Yl, ... , Yt be randomly chosen vectors, with
uniform distribution, in a d-dimensional vector space over 1F 2. Then the probability that Yl, ... , Yt are linearly independent is at least
i.
Proof. The cardinality of a d-dimensional vector space over IF 2 is 2d. Therefore, the probability that {Yl} is a linearly independent set is 2~d1, since only
choosing Y1 = 0 makes {yt} dependent. Suppose now that Yl, ... , Yi have
been chosen in such a way that S = {Yl, ... ,Yi-t} is a linearly independent
set. Now S generates a subspace of dimension i - 1. Hence, there are 2i - 1
choices for Yi that make S U {y;} linearly dependent. Thus, the probability
that the set {Y1," . , yt} is an independent set is
2d _ 20 2d _ 21
P
Then
=
2d
2d
( 4.4)
4.1 Generalized Simon's Algorithm
1
2d
t-l
p = II
2d _
i=O
t-l
2i
=
II
i=O
2d-i
2 d- i -
2d-t+i
t
1
=
II
i=l
67
1
2 d- t+ i -
Now
1
in 2p :::::
8
t
(
1)
1n 1 + 2i - 1 :::::
8
t
1
2i - 1 :::::
8 ~ . 38
t
1
2i
<
4
00
1
2i
412
3 2
3'
o
so 2~ < e~ < 2, hence p > ~.
Remark 4.1.1. Notice that if we build another algorithm (call it A') which
repeats Algorithm A whenever the output is "failure", then A' cannot give
us a wrong answer, but we cannot find a priori upper bound for the number
of repeats it has to make. Instead, we can say that on avemge, the number
of repeats is at most four. Such an algorithm is called a Las Vegas algorithm,
cf. Section 2.1.2. An interesting method for finding the basis in polynomial
time with certainty is described in [12].
Remark 4.1.2. Algorithm B for choosing an element in H1. resembles the
period-finding algorithm of Section 3.2.3. The first step was to prepare a
quantum superposition of all the elements of 1F2'. Then, by utilizing quantum parallelism, function p was computed simultaneously on all elements of
1F2'. This was the operative step to convey information of interest to the superposition coefficients, and this information was extracted by using QFT.
Notice that since p is different on distinct cosets, all the vectors Ip( t)) are
orthogonal and, therefore, the states
L
(_l)t. y IY) Ip(t))
yEHJ.
with different values of p( t) do not interfere.
To conclude this section, we describe how to find the basis of H 1., when
d
= dim H 1. is not known in advance. Because of Algorithm A, this is done
if we can describe an algorithm which can find out the dimension of H 1. .
The dimension of H1. can be found by utilizing Algorithm A (which uses
Algorithm B as a subprocedure). By Lemma 4.1.2 we see that, if we choose
uniformly t ::; d elements from a subspace of dimension d, then the probability
that the chosen elements are independent is at least
We utilize this as
follows: if D = dim H 1., the (unknown) dimension of H , we just guess that
the dimension of H 1. is d (0 ::; d ::; m), and run Algorithm A (with the d we
t
68
4. Finding the Hidden Subgroup
guessed). If d ::; D, we obtain, according to Lemma 4.1.2, a set of d linearly
independent vectors with a probability of at least
Once we get such a set
of vectors, we can tell, with certainty, that the guess d::; D was correct (the
number of linearly independent vectors that a vector space can contain is at
most the dimension of the space).
On the other hand, if d > D, the set of d vectors we obtain can never be
linearly independent. After repeating Algorithm A a number of times, we can
be convinced that our guess d > D was incorrect. Anyway, Algorithm A can
always be used to decide whether d ::; D holds with a correctness probability
of at least
We will now describe how to find D, the dimension of Hl... To make the
description easier, we will regard all other elements (D, H, and p) fixed, but
only the interval I where the dimension may be found, is mentioned as an
input.
For an interval 1= {k, k + 1, ... ,k + r} we use notation M(I) = l~J for
the "midpoint" of I, B(I) = {k, k+ 1, ... ,M(I) + I} for the "initial part" and
T(I) = {M(I), ... ,k+r} for the "final part" of I. In the following algorithm,
D always stands for the true (unknown) dimension of Hl.. and initially, the
input of the following algorithm is the interval I = {a, 1, ... ,m}.
i.
i.
Algorithm C:
Determining the Dimension
1. If III = 1, output the unique element of I and stop.
2. If III > 1, run Algorithm A to decide if D ::; M(I). If the answer is "yes",
apply this algorithm with input B(I); otherwise, apply this algorithm
with input T(I).
Algorithm C thus finds the dimension D by recursively cutting the interval
{a, 1, ... ,m} into two approximately equally long parts, and using Algorithm
A to decide which one is the part containing D. It is therefore clear that
Algorithm C stops after 8 (log2 m) recursive calls. Since running Algorithm
C, we have to make only k = 8(log2 m) decisions based on Algorithm A,
which has a correctness probability of at most
we have that Algorithm
i,
C works with a correctness probability of p 2: (it = 8(ilOg2m) = 8(~).
Therefore, repeating Algorithm C O(m 2 ) times we get an algorithm which
gives the dimension D with a nonvanishing correctness probability.
It is left as an exercise to analyze the running time of the algorithms A,
B, and C to conclude that the generators of a hidden subgroup can be found
in polynomial time provided that p can be computed in polynomial time.
Remark 4.1.3. It is an open problem whether the hidden subgroup problem can be solved, by using a quantum computer, in polynomial time for
non-abelian groups as well. For progress in this direction, see [28J and the
references therein. The hidden subgroup problem for non-abelian groups is of
special interest, since the graph isomorphism problem can be reduced to it.
4.2 Examples
69
4.2 Examples
We will now demonstrate how the hidden subgroup problem can be applied
to computational problems. These examples are due to [45J.
4.2.1 Finding the Order
This problem lies at the very heart of Shor's factorization algorithm: Let n
be a large integer and a another integer such that gcd(a, n) = 1. Denote
r = ordn(a), i.e., r is the least positive integer that satisfies ar == 1 (mod n).
In this problem, we have a group Z, and the hidden subgroup is rZ, whose
generator r should be found.
The function p(x) = aX + nZ satisfies Simon's promise, since
¢=::}
p(x) = p(y)
aX + nZ = aY + nZ
n I (aX - aY )
¢=::}
rl(x-y)
¢=::}
x
¢=::}
+ rZ = y + rZ.
Because Z is an infinite group, we cannot directly solve this problem by using
the algorithm of the previous section, but using finite groups Z21 instead of
Z already gives us approximations that are good enough, as shown in the
previous chapter.
4.2.2 Discrete Logarithm
If F is a cyclic group of order q, then there is a generator 9 E F such that
F = {1 = gO,gl,g2, ... ,gq-1} (see Section 8.1 for notions of group theory).
Hence, if we are given a generator 9 of F and another element a E F, then
a can be uniquely expressed as a = gr, where r E {a, 1, 2, ... ,q - 1}. The
discrete logarithm problem is to find r.
Since gq = 1 (and q is the least positive number with this property),
we have that grl = gr2 if and only if r1 == r2 (mod q). Hence, instead of
regarding the exponent of 9 as an integer, we can regard it as an element of
Zq as well. In fact, mapping Zq ~ F, r + qZ H gr is an isomorphism.
We can reduce the discrete logarithm to finding the hidden subgroup as
follows:
If a = gr E F, let G = Zq x Zq. The hidden subgroup H = {(a, ra) I a E
Zq} is generated by element (1, r). That function p : Zq x Zq ~ G defined as
satisfies Simon's promise, can be verified easily:
70
4. Finding the Hidden Subgroup
P(XI, YI) = p(X2, Y2)
{==?
gXla-Yl
{==?
gXl -rYl
=
gX2 a -Y2
= gX2 -rY2
But the last equation is true if and only if Xl -rYI = X2 -rY2 (when regarded
as elements of Zq), which, in turn, is true if and only if Xl - X2 = r(YI - Y2).
This is equivalent to condition (Xl, YI) + H = (X2, Y2) + H.
On the other hand, there may be other generators of H than (1, r), but
using a number-theoretic argumentation similar to that used in Section 3.3.2,
we can see that the discrete logarithm can be found with a nonvanishing
probability in polynomial time.
Remark 4.2.1. Reliability of the U.S. Digital Signature Algorithm is based on
the assumption that the discrete logarithm problem on finite fields remains
intractable on classical computers [46].
4.2.3 Simon's Original Problem
Simon [65] originally formulated his problem as follows:
Input: An integer m 2: 1 and a function P : IF2 --+ R, where R is a finite set.
Promise: There exists a nonzero element s E IF2 such that for all x, y E IF;'
p(x) = p(y) if and only if x = y or x = y + s.
Output: Element s.
It is plain to see that choosing G as IF2 and H = {O, s} as the subspace
generated by s makes this problem a special case of a more general problem
presented at the beginning of this chapter.
Simon's problem is of historical interest, since it was the first one providing
an example of a problem that can be solved, with nonvanishing probability,
in polynomial time by using a quantum computer, but which requires exponential time in any classical algorithm if p is considered a blackbox function
(see Section 5.1.2).
4.3 Exercises
1. Show that if M is a generator matrix of H, then
2. Show that if M is a generator matrix of Hand N is a d' x m matrix
such that M N..L = 0, then the subspace generated by the rows of N is
orthogonal to H.
3. The elementary row operations 011 a matrix over field IF are:
4.3 Exercises
a) Swapping two rows, Xi H Xj.
b) Multiplying a row with a nonzero element of IF, Xi H CXi, C #
c) Adding to a row another row multiplied by any element of IF,
Xi
+ CXj.
71
o.
Xi H
Show that if M' is obtained from M by using the elementary row operations, then the rows of M and M' generate exactly the same subspace.
4. A matrix is in a reduced row-echelon form if:
a) For any row containing a nonzero entry, the first such is 1. The first
nonzero entry in a row is called the pivot.
b) The rows containing only zero entries, if there are such rows, are the
last ones.
c) The pivot on each row is on the right-hand side of the pivot of the
row above.
d) Each column containing a pivot of a row has zeros everywhere else.
If only conditions a), b), and c) are satisfied, we say that the matrix is
in row-echelon form.
Prove the following Gauss-Jordan elimination theorem. Each kxm matrix
can be transformed into a reduced row-echelon form by using O(k 3 m)
field operations. Moreover, the nonzero rows of a row-echelon form matrix
are linearly independent.
5. Prove that if M = (1d I M') is a generator (d x (m - d)) matrix of H,
then N = (_MT 11m - d ) is a generator matrix of HT.
5. Grover's Search Algorithm
5.1 Search Problems
Let us consider a children's game called hiding the key: one player hides the
key in the house, and the others try to find it. The one who has hidden the key
is permanently advising the others by using phrases like "freezing", "cold",
"warm", and "hot", depending on how close the seekers are to the hiding
place. Without this advice, the game would obviously last much longer. Or,
can you develop a strategy for finding the key without searching through the
entire house?
5.1.1 Satisfiability Problem
There are many problems in computer science which closely resemble searching for a tiny key in a large house. We will shortly discuss the problem of
finding a solution for an NP-complete 3-satisfiability problem: we are given
a propositional expression in a conjunctive normal form, each clause being a
disjunction of three literals (a Boolean variable or a negation of a Boolean
variable). In the original form of the problem, the task is to find a satisfying
truth assignment, if any such exists. Let us then imagine an advisor who has
enough power to always tell at once whether a given boolean expression has
a satisfying truth assignment or not. If such an advisor were provided, finding a satisfying assingnment is no longer a difficult problem: we could just
substitute 1 for some variable and ask the advisor whether the outcoming
Boolean expression with less variables had a demanded assignment or not. If
our choice was incorrect, we would flip the substituted value and proceed on
recursively.
Unfortunately, the advisor's problem to tell whether a satisfying valuation
exists is an NP-complete problem, and in light of our present knowledge, it
is very unlikely that there would be a fast solution to this problem in the
real world. But let us continue our thinking experiment by assuming that
somebody, let us call him a verifier, knows a satisfying assignment for a
Boolean expression but is not willing to tell it to us. Quite surprisingly, there
exist so-called zero-knowledge protocols (see [62] for more details), which the
verifier can use to guarantee us that he really knows a satisfying valuation
74
5. Grover's Search Algorithm
without revealing even a single bit of his knowledge. Thus, it is possible that
we are quite sure that a satisfying truth assignment exists, yet we do not have
a clue what it might be! The obvious strategy to find a satifying assignment is
to search through all of the possible assignments. But if there are n Boolean
variables in the expression, there are 2n assignments, and because we do not
have a supernatural advisor, our task seems quite hopeless for large n, at
least in the general case. In fact, no faster method than an exhaustive search
is known in the general case.
5.1.2 Probabilistic Search
In what follows, we will slightly generalize our search problem. Instead of
seeking a satisfying assignment for a Boolean expression, we will generally
talk about functions
f : JF2 -+ JF 2 ,
and our search problem is to find a x E JF2 such that f(x)
=
1 (if any such
x exists).
Notice that, with the assumption that f is computable in polynomial
time, this model is enough to represent NP problems because it includes
the NP-complete 3-satisfiability problem. This model is also assigned [31J to
unordered database search, where we have to find a specific item in a huge
database. Here the database consists of 2n items, and f is the function giving
a value of 1 to a required item and 0 to the rest, hereby telling us whether
the item under investigation was the required one.
Another simplification, a huge one this time, is to assume that f is a socalled blackbox function, i.e., we do not know how to compute f, but we can
query f, and the value is returned to us instantly in one computational step.
With that assumption, we will demonstate in the next chapter how to derive
a lower bound for the number of queries to f in order to find an item x such
that f(x) = 1. In the next chapter we will, in fact, present a general strategy
for finding lower bounds for the number queries concerning other goals, too.
Now we will study probabilistic search i.e., we will omit the requirement
that the search stragegy will always give such a value x that f(x) = 1, but
will only do this with a non vanishing probability. This means that the search
algorithm will give the required x with at least some constant probability
o < p :S 1 for any n. This can be seen as a natural generalization of the
search which with certainty returns the required item.
Remark 5.1.1. Provided that the probabilistic search strategy is rapid, we use
very standard argumentation to say that we can find x rapidly in practice: the
probability that, after m attempts we have not found x, is at most (l_p)m :S
e- pm , which is smaller than any given E > 0, when m > -~. Thus, we
can reduce the error probability p to any other positive constant E just by
repeating the original search a constant number of times.
5.1 Search Problems
75
It is easy to see that any classical deterministic search algorithm which
always gives an x such that f(x) = 1 (if such x exists) will require N = 2n
queries to f: if some algorithm makes less than N queries, we can modify f
in such a way that the answer is no longer correct.
Remark 5.1.2. In the above argumentation, it is, of course, essential that
we only handle blackbox functions. If we are, instead, given an algorithm
for computing f, it is very difficult to say how easily this algorithm itself
will give information about x to be found. In fact, the question about the
computational work required to recover x from the algorithm for f lies at
the very heart of the difficult open problem whether P #- NP.
We cannot do much better than deterministic search by allowing randomization: fix y E 1F2 and consider a blackbox function fy defined by
f ()
'II X
=
{ 1, if x
= y,
0, otherwise.
(5.1)
If we draw disjoint elements Xl, ... , Xk E 1F2 with uniform distribution, the
probability of finding y is .;" so we would need at least pN queries to find
y with a probability of at least p. Using non-uniform distributions will not
offer any relief:
Lemma 5.1.1. Let N = 2n and f be a blackbox function. Assume that AI
is a probabilistic algorithm that makes queries to f and returns an element
x E 1F2. If, for any non-constant f the probability that f(x) = 1 is at least
p > 0, then there is f such that AI makes at least pN - 1 queries.
Proof. Let fy be as in (5.1) and Py(k) be the probability that AI" returns
y using k queries. By assumption Py (k) ~ p, and we will demonstrate that
there'is some y E 1F2 such that Py(k) :s;
First we show by using induction that
W.
L
Py(k)
:s; k + l.
yEIF 2
If k = 0, then AI1I gives any x E 1F2 with some probability Pm and thus
L
yEIF 2
Py(O) =
L
py = l.
yEIF2
Assume, then, that
L
Py(k - 1)
:s; k.
yEIF 2
On the kth query AI" queries f(y) with some probability qy, and therefore
Py(k) :s; Py(k - 1) + qy. Thus,
5. Grover's Search Algorithm
76
Because there are N
Py(k):::;
= 2n
different choices for y, there must exist one with
k~ 1.
It follows that ~ 2: Py(k) 2: p, so k 2: pN - 1.
o
5.1.3 Quantum Search with One Query
Let us continue studying blackbox functions f : 1F2' ---+ 1F 2 . The natural
question arising now is whether we can devise a faster search method on
a quantum computer using quantum parallelism for querying many values of
a black box function simultaneously.
The very first thing we need to fix is the notion of a quantum blackbox
function. The notion we follow here is widely accepted: let x E 1F2'. In order
to make a blackbox function query f(x) on a quantum computer, we will
utilize a source register Ix) (n qubits) and a target qubit Ib). A query operator
Q f is a linear mapping defined by
Qf Ix) Ib) = Ix)
Ib EB f(x))
(5.2)
,
where EB means addition modulo 2 or, in other words, exclusive or -operation.
We can now easily see that (5.2) defines a unitary mapping. In fact, vector
set
spans a 2n +1-dimensional Hilbert space. Mapping Qf operates as a permutation on B, and it is a well-known fact that any such mapping is unitary.
Moreover, since
QfQf Ix) Ib) = Ix)
Ib EB f(x)
EB f(x)) = Ix) Ib),
the inverse of Qf is Qf itself.
Let us now fix y E 1F2' and try to devise a quantum search algorithm for
function fy defined in (5.1). The very first idea is to generate a state
1
Inn
L
v 2·" :cEIF;'
Ix) 10)
(5.3)
beginning with 10) 10) and using Hadamard-Walsh transform Hn (cf. Section
3.1.2). After that, one could make a single query Qfy to obtain state
(5.4)
5.1 Search Problems
77
But if we now observe the last qubit of state (5.4), we would only see 1 (and
after that, observing the first register, get the required y) with a probability
of 2~' so we would not gain any advantage over just guessing y.
But quantum search can improve the probabilistic search, as we will now
demonstrate. Having state (5.3), we could flip the target bit (to 11)), and
then apply the Hadamard transform to that bit to get a state
1
rnn
y 2··
L
"'Elf,
1
1
Ix) ",(1 0) -11)) = v' n+1
Y2
2
L
(
Ix) 10) -
"'Elf,
L
Ix) 11)). (5.5)
"'Elf,
If x =I- y, then Qf" Ix) 10) = Ix) 10) and Qf" Ix) 11) = Ix) 11), but Qf" Iy) 10) =
Iy) 11) and Qf" Iy) 11) = Iy) 10); so, querying fy by the query operator Qf"
on state (5.5) would give us a state
v'2~+1 ( L
Ix) 10)
v'2~+1 ( L
Ix) (10) - 11)) + Iy) (11) - 10)))
+ Iy) 11) -
"'#Y
=
L
Ix) 11) -Iy) 10) )
"'#Y
"'#Y
= _1_
L (-I)f,,("') Ix) ~(IO) -11)).
J2
ffn "'Elf,
(5.6)
Notice that preparing the target bit in superposition ~(IO) - 11)) before
applying the query operator was used to encode the value fy(x) in the sign
(-I)f,,("'). This is a very basic technique in quantum computation. In the
continuation we will not need the target bit anymore, and instead of (5.6) we
will write only
(5.7)
So far we have seen that applying the query operator only once in a quantum
superposition (5.4), we get state (5.7) ignoring the target qubit. We continue
as follows: we write (5.7) in form
and apply Hadamard Hn transform to get
10) -
;n L (-l)"'·y Ix)
"'ElF,
=
(1- 22n) 0)- 2: L(-I)"'·y Ix).
1
",#0
(5.8)
5. Grover's Search Algorithm
78
In superposition (5.8), we separate 10) from all other states by designing
function fo : 1F2 -+ 1F 2 , which gets a value of 1 for 0 and 0 for all other
x E 1F2. Such a function is possible to construct by using O(n) Boolean gates
and, through the considerations of section 2.3.3, it is also possible to construct
fo by using O(n) quantum gates, using several ancilla qubits. We will store
the value of fo in an additional qubit to obtain state
22n) 10)11) -
(1 -
2: I) -1)""Y Ix) 10) .
(5.9)
",;"0
Now, observation of the last qubit of (5.9) will result in
10) 11)
(5.10)
with a probability of (1 - 2:)2 = 1 - 2~
+ In,
and in
~ l)-I)""Y Ix) 10)
2 - 1 ",;"0
=
~(
2
1
L
"'E
IF n
(5.11)
(-I)""Y Ix) -1 0 )) 10)
2
In.
with a probability of 2~' Applying Hadamard transform Hn once more to states (5.10) and (5.11)
will yield (ignoring the last qubit)
1
!fri Liz)
(5.12)
v",'· "ElF;'
and
respecti vely.
Finally, observing (5.12) and (5.13) will give us y with probabilities 2~
and 2~;;-1 respectively. Keeping in mind the probabilities to see states (5.10)
and (5.11), we find that the total probability of observing y is
2:'
the best we could get by
which is approximately 2.5 times better than
a randomized algorithm that queries fy only once (see the proof of Lemma
5.1.1).
5.2 Grover's Amplification Method
79
Soon, we will see that, by using a quantum circuit, it is still possible
to improve this probability. The next section is devoted to Lov Grover's
ingenious idea of using quantum operations to amplify the probability for
finding the required element.
5.2 Grover's Amplification Method
We will still use blackbox function 111 : 1F2 --+ 1F2 from the previous section as
an example. The task is to find y by querying the 111 , and we will follow the
method of Lov Grover to demonstrate that using a quantum circuit, we. can
find y with nonvanishing probability by using 0(5) queries. We will also
show how to generalize this method for all blackbox functions.
5.2.1 Quantum Operators for Grover's Search Algorithm
A query operator Qf'll which is used to call for values of 111 uses n qubits for
the source register and 1 for the target. We will also need a quantum operator
Rn defined on n qubits and operating as Rn 10) = -10) and Rn Ix) = Ix),
if x =F o. If we index the rows and columns of a matrix by elements of 1F2
(ordered as binary numbers), we can easily express Rn as a 2n x 2n matrix;
-100 ... 0
010 ... 0
001. .. 0
(5.14)
000 ... 1
But we require that the operations on a quantum circuit should be local, i.e.,
there should be a fixed upper bound on number of qubits on which each gate
operates. Let us, therefore, pay some attention on how to decompose (5.14)
into local operations.
The decomposition can obviously be made by making use of function
10 : 1F2 --+ 1F2' which is 1 at 0 and 0 everywhere else. As discussed in the
earlier section, a quantum circuit Fn for this function can be constructed by
using O(n) simple quantum gates (which operate on at most three qubits)
and some, say kn, ancilla qubits. To summarize: we can obtain a quantum
circuit Fn on n + kn qubits operating as
Fn Ix) 10) 10) = Ix) Ilo(x)) la",).
Using this, we can proceed as follows: we begin with.
Ix) 11) 10),
and then operate with H2 on the target bit to get
(5.15)
80
5. Grover's Search Algorithm
Ix)
~(IO) -11)) 10).
After this, we call Fn to get
Using first the reverse circuit F;;l and then H2 on the target qubit will give
us
(_I)fo(oo) Ix) 11) 10) .
Composing all of this together, we get an operation
Ix) 11) 10)
H
(_I)fo(oo) Ix) 11) 10),
which is the required operation Rn with some ancilla qubits. As usual, we
will not write down the ancilla explicitly.
The other tool needed for a Grover search is to encode the value f(x) in
the sign - but this can be easily achieved by preparing the target bit in a
superposition:
Qf Ix)
~(lO) -11)) =
(-I)f("') Ix)
~(IO) -11)).
(5.16)
Instead of (5.16), we will introduce notation
Vf Ix)
= (-I)f("') Ix)
for the modified query operator, again omitting the ancilla qubit.
5.2.2 Amplitude Amplification
Grover's method for finding an element y E IF2" that satisfies f(y) = 1 ~ we
will call such a y a solution hereafter ~ can be viewed as iterative amplitude
amplification. The basic element is quantum operator
(5.17)
working on n qubits, which represent elements x E IF2"' In the above definition, Vf is the modified query operator working as Vf Ix) = (-l)f("') Ix), Hn
is the Hadamard transform, and Rn is the operator which reverses the sign
of 10), Rn 10) = -10) and Rn Ix) = Ix) for x =1= o.
It is interesting to see that HnRnHn can be written in quite a simple
form as a 2n x 2n matrix: if we let elements x E IF2" to index the rows and
the columns, we find that, since
5.2 Grover's Amplification Method
81
An
the element of Hn at location (x,y) is
(-l)"'·y. Writing generally A",y
n
n
for the element of a 2 x 2 matrix A at location (x, y), we see (recall the
matrix representation of Rn in the previous section) that
(HnRnHn)",y =
L
(HnRn)",z(Hn)zy
"'ElF;'
z
=
w
2: L (-l)""Z(Rn)zz( _l)z,y
zEIF;'
=
L
21n (-2 +
(_l)("'+Y)'Z)
zEIF;'
= { - ~,
1-
2n
,
.if x # y
If X = y.
Thus HnRnHn has matrix representation
which can also be expressed as
(5.18)
where I is 2n x 2n identity matrix and P is a 2n x 2n projection matrix, whose
every entry is 2~' In fact, it is quite an easy task to verify that P represents
a projection into a one-dimensional subspace generated by vector
'Ij;
=
1
L
InTi
Ix).
y 2·· "'ElF;'
Thus, using the notations of Chapter 7, we can write P =1 'Ij;)('Ij; I, but this
notation is not crucial here. It is more essential that representation (5.18)
gives us an easy method for finding the effect of -HnRnHn on a general
superposition
(5.19)
Writing
82
5. Grover's Search Algorithm
(the average of the amplitudes), we find that
(5.20)
is the decomposition of (5.19) into two orthogonal vectors: the first belongs to
the subspace spanned by 'IjJ; the second belongs to the orthogonal complement
of that subspace. In fact, it is clear that the first summand of the right-hand
side of (5.20) belongs to the subspace generated by 'IjJ, and the orthogonality
of the summands is easy to verify:
'"
Y
LA*(c", - A)(x I y)
= L
"'EiF2' yiF2'
=A* L
C",-
"'EiF2'
L
A*A
"'EiF2'
=A*2nA-2nA*A=0.
Therefore,
and
= 2A
=
L
L
"'EiF2'
Ix) -
L
c'" Ix)
"'EiF2'
(2A - c"') Ix) .
(5.21 )
"'EiF2'
Expression (5.21) explains why operation -HnRnHn is also called inversion
about average: it operates on a single amplitude c'" by multiplying it by -1
and adding two times the average.
We use the following example to explain how G n = -HnRnHn Vf is used
to amplify the desired amplitude in order to find the solution (recall that the
solution is an element y E 1F2 which satisfies f(y) = 1). In this example, we
consider a function f5 : 1F2 -+ 1F 2 , which takes value 1 at y = (0, ... ,0,1,0,1),
and is 0 everywhere else. The search begins with superposition
1
'2n"
L
v.c.·· "'EiF2'
Ix)
5.2 Grover's Amplification Method
83
having uniform amplitudes. Figure 5.1 depicts this superposition, with Co =
Cl = ... C2 n =
The modified query operator Vis flips the signs of all
An.
1
~ -~I--~--~~--~~--~--~----~
Co
Cl
C2
C3
C4
Cs
C6
C7
C2n_l
Fig. 5.1. Amplitudes of the initial configuration.
those amplitudes that are coefficients of a vector Ix) satisfying 15(x) = 1. In
this example, y = (0, ... ,0,1,0,1) is the only one having this property, and
C5 becomes This is depicted in Figure 5.2.
An.
1
~ -~I--~~--~--~-r--~~----~
Fig. 5.2. Amplitudes of after one query.
In the case of Figure 5.2, the average of the amplitudes is
An.
which is still very close to
Thus, the inversion about average-operator
-HnRnHn will perform a transformation
5~2A-5~$
{ -$
~ 2A+ An ~3· An,
which is illustrated in Figure 5.3. It is thus possible to find the required
1
Vi2n
Fig. 5.3. Amplitudes after -HnRnHn V,.
y = (0, ... ,0,1,0,1) with a probability ~ 2~ by just a single query. This is
approximately 4.5 times better than a classical randomized search can do.
84
5. Grover's Search Algorithm
5.2.3 Analysis of Amplification Method
In this section, we will find out the effect of iterative use of mapping G n =
-HnRnHn VI, but instead of a blackbox function f : lF2' -t lF2 that assumes
only one solution, we will study a general f having k solutions (recall that
here a solution means a vector x E lF2' such that f(x) = 1).
Let the notation T S;; lF2' stand for the set of solutions, and F = lF2' \ T
for the set of non-solutions. Thus ITI = k and IFI = 2n - k. Assume that
after r iteration steps, the state is
tr
L
Ix)
+ fr
xET
L
Ix) .
(5.22)
xEF
The modified query operator VI will then give us
-tr
L
Ix)
+ fr
xET
L
Ix) ,
(5.23)
xEF
and the average of the amplitudes of (5.23) is
Operator -HnRnHn will, therefore, transform (5.23) into
where tr+l = 2A-tr = (1- ~~)tr+(2- ~~)fr and fr+l = 2A- fr = -~~tr+
(1 - ~~) fr. Collecting all of this together, we find that Gn = - Hn Rn H n VI
operates on (5.22) as a transformation
(5.24)
Therefore, to find out the effect of G n on uniformly weighted initial superposition
1
"fF
L
Ix),
xEIl';'
In.
we have to solve (5.24) with boundary conditions to = fo =
This can be
done as follows: it is clear that tT and fT are real and, since G n is a unitary
operator, equation
(5.25)
5.2 Grover's Amplification Method
85
must hold for each r. That is to say that each point (tr, fr) lies in the ellipse
defined by equation (5.25). Therefore, we can write
{
.fie
tr =
sinOr,
fr = v'2~-k cos Or
for some number Or. Recursion (5.24) can then be rewritten as
{
~! ) sin Or + 2~ J2n - k.,[k cos Or,
2~ ';2 n - k.,[k sin Or + (1 - ~!) cos Or.
sin Or+! = (1 cos Or+! = -
(5.26)
Recall that k is the number of elements in IF2 that satisfy f(y) = 1, hence,
1- ~! E [-1,1]. Therefore, we can choose wE [0,11"] such that cosw = 1- ~!.
Then sinw = 2~ J2n - k.,[k and (5.26) becomes
{ SinOr +1 = sin (Or +w),
cosOr+! = cos(Or +w).
The boundary condition gives us that sin 2 (}o
that the solution of the recursion is
{
=
2~' and it is easy to verify
.fie
tr =
sin(rw + (0 ),
fr = v'2~-k cos(rw + (0 ),
where 00 E [0,11"/2] and w E [0,11"] are chosen in such a way that sin200 = .;.
and cosw = 1 - ~!. In fact, then cosw = 1 - 2sin2 <Po = cos 200 , so we have
obtained the following lemma.
Lemma 5.2.1. Solution of (5.24) with boundary conditions to = fo =
is
{
tr =
fr =
.fie sin ((2r + 1)00 ),
v'2!-k
cos ((2r + 1)(}0) ,
An
(5.27)
where 00 E [0,11"/2] is determined by sin 2 00 = 2~ .
We would like to find a suitable value for r to maximize the probability for
finding a solution. Since there are k solutions, the probability of seeing one
solution is
(5.28)
We would then like to find a non-negative r as small as possible such that
sin2((2r + 1)00 ) is as close to 1 as possible; i.e., the least positive integer r
such that (2r + 1)00 is as close to ~ as possible.
We notice that
86
5. Grover's Search Algorithm
(2r
+ 1)00 =
and, since
7r
-2
<===}
()5 ~ sin 2 ()o =
1
r = -2
7r
+ -()
,
4 0
2':' we have
and therefore after approximately
iterations, the probability of seeing a desired element y E lF2' is quite close
to 1. To be more precise:
Theorem 5.2.1. Let f : lF2' -+ lF2 such that there are k elements :.z: E lF2'
satisfying f (:.z:) = 1. Assume that 0 < k ::; ~ . 2n, 1 and let ()o E [0, 7r /3) be
chosen such that sin2 ()o = 2': ::; ~. After l4;0 J iterations of G n on an initial
superposition
the probability of seeing a solution is at least
i.
Proof. The probability of seeing a desired element is given by sin2((2r+1)()0),
+ 4;0 would a give of probability 1. Thus, we
and we just saw that r =
-!
only have to estimate the error when
Clearly
for some 8 with 181 ::;
( 2l4;0
-! + 4;0
!. Therefore,
J + 1) ()o = ~ + 28()0,
so the distance of (2l4;oJ + 1)00 from
.2(2 l7rJ
·2 (-7r2 + 1) ()o > sm
40
sm
1
is replaced by l4;0 J .
0
-
~ is 128()0 1::;
i. It follows that
7r = -.
1
-)
3
4
If k > ~ . 2n, we can find a desired solution :.z: with a probability of at least ~
just by guessing, and if k = 0, then G n does not alter the initial superposition
at all.
5.3 Utilizing Grover's Search Method
87
Grover's method for quantum search is summarized in the following algorithm:
Grover's Search Algorithm
Input: A blackbox function f: 1F2' -+ 1F2 and k = I{x E 1F2' I f(x) = 1}1·
Output: An y E 1F2' such that f(y)
exists.
1. If k
>
= 1 (a solution), if such an element
~ . 2n , choose y E 1F2' with uniform probability and stop.
2. Otherwise compute r =
. 2 ()
SIn
0
=
l4~o J, where ()o E [0, 7r /3] is determined by
k
2n·
3. Prepare the initial superposition
1
IF
v L, .•
L
"'ElF;'
Ix)
by using Hadamard transform Hn.
4. Apply operator en r times.
5. Observe to get some y E 1F2'.
If k ~ ~ . 2n , then the first step clearly produces correct output with
a probability of at least ~. Otherwise, according to Theorem (5.2.1), the
algorithm gives a solution with a probability of at least
In each case, we
can find a solution with nonvanishing probability.
i.
If k = 1 and n large, then l4~o J ;:;: : ~ IF, so using 0 (IF) queries to
f we can find a solution with nonvanishing probability, which is essentially
better than any classical randomized algorithm can do.
Case k =
2n is also very interesting. Then, sin 2 ()o =
so ()o =
i. According to (5.28), the probability of seeing a solution after one single
iteration of en is
i.
sin 2 (3()o)
i,
= sin2 ~ = 1.
i.
Thus for k =
2n we can find a solution with certainty using en only once,
which is clearly impossible using any classical search strategy.
In a typical situation we, unfortunately, do not know the value of k in
advance. In the next section, we present a simplified version of a method due
to M. Boyer, G. Brassard, P. Hoyer and A. Tapp [11] in order to find the
required element even if k is not known.
5.3 Utilizing Grover's Search Method
5.3.1 Searching with Unknown Number of Solutions
We begin with an elementary lemma, whose proof is left as an exercise.
88
5. Grover's Search Algorithm
Lemma 5.3.1. For any real a and any positive integer m,
L
m-l
r=O
cos((2r + l)a)
=
sin(2ma)
.
2sma
.
The following important lemma can be found in [11].
Lemma 5.3.2. Let f : IF;' -t IF2 a blackbox function with k :::: ~. 2n solutions
and 00 E [O,~] defined by equation sin2 00 = 2~' Let m be any positive integer
and r E [0, m -1] chosen with uniform distribution. If G n is applied to initial
superposition
1
'21'
v f.'.
r
L
"'ElF;'
Ix)
times, then the probability of seeing a solution is
p. _ ~ _ sin( 4mOo)
2
r -
4msin( 20o)'
Proof. In the previous section we saw that, after r iterations of G n , the
probability of seeing a solution is sin2((2r + 1)00 ). Thus, if r E [0, m - 1] is
chosen uniformly, then the probability of seeing a solution is
Pm =
~
m-l
L
sin2((2r + 1)00 )
r=O
1
L
m-l
(1 - cos((2r + 1)200 ))
2m r=O
1
sin(4mOo)
2 4m sin(200 )
=-
o
according to Lemma 5.3.1.
Remark 5.3.1. If m >
. (1 )' then
SIn 200
sin( 4mOo) ::; 1 =
sin(~Oo) sin(200 ) ::; msin(200 ),
: ; i·
i·
and therefore 4:~~(:~;)
The above lemma implies, then, that Pr ~
This means that if m is great enough, then applying G n r times, where r E
[0, m - 1] is chosen uniformly, will yield a solution anyway with a probability
of at least
i.
Assume now that the unknown number k satisfies
__
1_ =
1
=
2n
sin(20o)
2 sin 00 cos 00
2Jk(2 n
°< k ::::
< (2n < J2Ti
-
k) -
Vk -
,
~ . 2n. Then
5.3 Utilizing Grover's Search Method
89
so choosing m ? ffn we certainly have m ? sin(~lIo)' This leads to the
following algorithm:
Quantum Search Algorithm
Input: A blackbox function
I: lF2' ~ lF2 .
Output: Any y E lF2' such that I(y) = 1 (a solution), if such an element
exists.
1. Pick an element x E lF2' randomly with uniform distribution. If I(x) = 1,
then output x and stop.
2. Otherwise, let m =
+ 1 and choose an integer r E [0, m - 1]
uniformly.
3. Prepare the initial superposition
lffnJ
1
V
'2n
4"
L
",0'2
Ix)
by using Hadamard transform Hn and apply Gn r times,
4. Observe to get some y E lF2'.
Using the previous lemmata, the correctness probability of this algorithm
is easy to analyze: let k be the number of solutions to f. If k > ~. 2n, then the
algorithm will output a solution after the first instruction with a probability
On the other hand, if k ~ ~, then
of at least
l
>
m -
> _:-:1-:-:Vrzn
k - sin(20o)'
and, according to Remark 5.3.1, the probability of seeing a solution is at least
41 anyway.
Remark 5.3.2. Since the above algorithm is guaranteed to work with a probability of at least ~, we can say that on average the solution can be found
after four attempts. In each attempt, the number of queries to 1 is at most
ffn.
A linear improvement of the above algorithm can be obtained by emplying
the methods of [11]. In any case, the above algorithm results in the following
theorem.
Theorem 5.3.1. By using a quantum circuit making O( ffn) queries to a
blackbox function I, one can decide with nonvanishing correctness probability
il there is an element x E lF2' such that 1(x) = 1.
In the next chapter we will present a clever technique formulated by
R. Beals, H. Buhrman, R. Cleve, M. Mosca, and R. de Wolf [4], which will
allow us to demonstrate that the above algorithm is, in fact, optimal up to
90
5. Grover's Search Algorithm
a constant factor. Any quantum algorithm that can discover, with nonvanishing probability, whether a solution to f exists uses Sl( 5) queries to f.
This result concerns black box functions, so it does not imply a complexity
lower bound for computable functions (see Remark 5.1.2).
On the other hand, blackbox function f can be replaced with a computable function to get the following theorem.
Theorem 5.3.2. By using a quantum circuit, any problem in NP can be
solved with nonvanishing correctness probability in time O( 5p( n)), where
p is a polynomial depending on the particular problem.
In the above theorem, polynomial p( n) is essentially that the nondeterministic
computation needs to solve the problem (see Section 2.1.2).
Grover's search algorithm has been cunningly used for several purposes.
To mention a few examples, see [13] for quantum counting algorithm, and
[27] for minimum finding algorithm. In [11] the authors outline a generalized
search algorithm for sets other than 1F2.
6. Complexity Lower Bounds
for Quantum Circuits
Unfortunately, the most interesting questions related to quantum complexity
theory, for example, "Is NP contained in BQP or not?", are extremely difficult and far beyond recent knowledge. In this chapter we will mention some
complexity theoretical aspects of quantum computation, which may somehow
illustrate these difficult questions.
6.1 General Idea
A general technique for deriving complexity theoretical lower bounds for
quantum circuits using black box function queries was introduced by R. Beals,
H. Buhrman, R. Cleve, M. Mosca, and R. de Wolf in [4J. Before going into the
details, let us briefly discuss the idea informally. To simplify some notations,
we will sometimes interpret an element x E lF2 as a binary number in interval
[0,2n - 1J.
Assume that a quantum circuit finds out if an arbitrary black box function
has some property or not (for instance, in Section 5.1.2 we were interested in whether there was an element y E lF2 such that I(y) = 1).
This quantum circuit can then be seen as a device which is given vector
(1(0),1(1),···,1(2n - 1)) (the values of f) as an input, and it gives output
"yes" or "no" respectively if 1 has the property or not. In other words, the
quantum circuit can then be seen as a device for computing a {O, 1}-valued
function on input vector (1(0),1(1), ... , 1(2 n - 1)) (unknown to us), which
is to say that the circuit is a device for computing a Boolean function on 2n
input variables f(O), f(l), ... , f(2 n - 1).
Although lot of things about Boolean functions are unknown to us, fortunately, we do know many things about them; see [48J for discussion. It turns
out that the polynomial representations of Boolean functions will give us the
desired information about the number of black box queries needed to find out
how many queries are required to find out some property of f. The bound
will be expressed in terms of polynomial representation degree for quantum
circuits that exactly (with a probability of 1) compute the property (Boolean
92
6. Complexity Lower Bounds for Quantum Circuits
function), and in terms of polynomial approximation degree for circuits that
have a high probability of finding that property. We will see that the number of queries to the black box function will bound above the representation
degree of a Boolean function computable with a particular quantum circuit.
The lower bound for the number of necessary queries is thus established by
finding the representation degree of the Boolean function to be computed: the
quantum circuit must make enough queries to reach the representation degree. Because the polynomial representations and approximations play such
a key role, we will devote the following two sections to study them.
6.2 Polynomial Representations
6.2.1 Preliminaries
In this section, we represent the basic facts about polynomial representations
of Boolean functions and symmetric polynomials. A reader who is already
familiar with these topics may skip this section.
Let N = 2n. As discussed in Sections 3.1.1 and 8.2.2, functions
F : 1F~ --t C
form an 2N -dimensional vector space V over complex numbers. The idea is
that Boolean functions on N variables can be viewed as elements of V: they
are functions defined in N binary variables getting values in a two-element
set {O, I} which can be embedded in C.
For each subset K = {kl,k2, ... ,kl}'~ {O,l, ... ,N -I} we consider a
function
(6.1)
which we denote also by XK = X k1 Xk2 ... Xkl and define in a most natural
way: the value of function XK on a vector x = (Xo, Xl,"" XN-l) is the
product Xk 1 Xk 2 ..... Xkl' which is interpreted as the number or 1. This may
require some explanation: x is an element in 1F~, and each of its component
Xi is an element of the binary field 1F2 . Thus, the product Xk 1 Xk2 . . . . . Xk 1
also belongs to 1F2, but embedding 1F2 in C, we may interpret the product
as also belonging to C. It is natural to call functions X K monomials and to
denote X0 = 1 (an empty product having no factors at all is interpreted as
1). If JKJ = l, the degree of a monomial XK = Xk 1 X k2 ... Xk 1 is defined to be
deg XK = l. A linear combination of monomials X kp ... , XKs is naturally
denoted by
°
(6.2)
and is called a polynomial. The degree deg P of polynomial (6.2) is defined
as the highest degree of a monomial occurring in (6.2), except in the case of
zero polynomial p = 0, whose degree we symbolically define to be -00.
6.2 Polynomial Representations
93
A reader may now wonder why we would consider such a simple concept
as a polynomial so precisely? The problematic point is that, in a finite field,
the same function can be defined by many different polynomials. For example,
both elements of the binary field lF2 satisfy equation x2 = x and, thus, nonconstant polynomial x 2 - x + 1 behaves like a constant polynomial 1.
We will, therefore, define the product of monomials X K and XL by considering XK and XL as functions lFf --+ lF 2 . It is plain to see that this will result
in definition XKXL = XKUL. The product of two polynomials is defined as
usual: if
and
are polynomials, we define
s
PI P2 =
t
L L::CidjXkiX1j"
i=l j=l
The product differs slightly from the ordinary product of polynomials. Take,
for example, K = {a, 1} and L = {1, 2}. Then
= XOXI X IX2 = XOX IX2 = X{O,I,2},
XKXL
but if XOXI and X I X 2 were seen as "ordinary polynomials" over F2 , the
product would have been XoXr X2. In fact, the The functions behave just
like ordinary polynomials, except all powers above 1 must be reduced to 1. I
Lemma 6.2.1. The monomials form a basis ofV.
Proof For any y = (Yo, YI, ... ,Y2"-I) E lFf, define a polynomial
P1I = (1
+ Yo -
Xo)(1
+ YI -
Xl) ... (1
+ YN-I -
XN-I).
By expanding the product, we can get a representation for P1I as a sum of
monomials, and for any x = (xo, Xl> . .. ,XN-I) E lFf,
{=:>
{=:>
{=:>
P1I (x) = 1
1 + Yi - Xi = 1 for each i
Xi = Yi for each i
x = y,
- For
-a reader who is aware of algebraic structures: the polynomial functions conI
sidered here form a ring isomorphic to lF2[XO, Xl, ... , XN-Il! I, where I is the
ideal generated by all polynomials xl- Xi.
6. Complexity Lower Bounds for Quantum Circuits
94
which is to say that Py is the chamcteristic function of singleton set {y}.
But because we can express each characteristic function IFf 4 C as a sum
of monomials, we can surely express any function IFf 4 C as a linear combination of monomials (see Section 8.2.2). Therefore, monomials generate V,
and since there are
different monomials, they are also linearly independent.
o
It follows from the above lemma that each function IFf 4 IF 2 can be
uniquely expressed as a linear combination of monomials (6.1) with complex
coefficients. Clearly, all Boolean functions with N variables are included. In
fact, we would have also been able to derive such a representation for Boolean
functions also directly: Boolean variables can be represented by monomials
X o, ... , X N - l , and if Pi and P2 are polynomial representations of some
Boolean functions, then ""Pl can be represented as 1 - P b Pi /\ P2 as Pi P2 ,
and Pi V P2 as 1 - (1 - Pl )(l - P2 ). Since all Boolean functions can be
constructed by using ..." /\, and V, (see Section 2.3.1) we have the follwing
lemma.
Lemma 6.2.2. Boolean functions on N variables have a unique representation as a multivariate polynomial P(Xo, Xl, ... XN-d having real coefficients
and degree at most N.
We must keep in mind that "polynomial" in the above lemma is interpreted
in such a way that X k = X for each k :::: l.
In the continuation we will mainly concentrate on a subclass of symmetric
Boolean functions that form a subclass of symmetric functions F : IFf 4 C.
Symmetric functions are formally defined as follows:
Definition 6.2.1. Let x = (xo,Xl, ... ,XN-l) E IFf be any vector and
7r: {O,l, ... ,N -I} 4 {O,l, ... ,N -I} any permutation. Denote 7r(x) =
(x 7r (O),X 7r (l), ... ,X7r (N-l»). A function F : IFf 4 C is symmetric if for any
vector x E IFf and for any permutation 7r E SN,
F(x) = F(7r(x))
holds.
Clearly, any linear combination of symmetric functions is again symmetric,
thus, symmetric functions form a subspace of W c V. We will now find a basis
for W. Let P be a nonzero polynomial representing a symmetric function.
Then P can be represented as a sum of homogenous polynomials2
2
A homogenous polynomial is a linear combination of monomials having the same
degree.
6.2 Polynomial Representations
95
such that deg Qi = i. Because P itself was assumed to be symmetric, each Qi
must also be a symmetric polynomial (otherwise, by permuting the variables,
we could get a different polynomial representing the same function as P does,
which is impossible). Thus, we can write
where the sum is taken over all subsets of {O, 1, ... , N - I} having a cardinality of i. Because Qi is invariant under variable permutations, and because
representation as a polynomial is unique, we must have that Ckl,k2,oo.,ki = Ci
independent of the choice of {kl, k 2 , •.• , kilo That is to say, symmetric polynomials
Vo = 1,
Vi =XO+Xl+· .. +XN-l,
V2 = XOXl
+ XOX2 + ... + X n - 2XN-l
VN = XOX 1 ··· XN-l
generate W. Because they have different degrees, they are also linearly independent.
The following triviality is worth mentioning : the value of a symmetric
function F(x) depends only on the Hamming weight of x, which is given by
wt(x) = I{i I Xi = I}I·
We can even quite easily find out the value of Vi(x) when wt(x) = k: consider
a polynomial
P(T) = (T - Xo)(T - Xt} ... (T - XN-t).
Since P is invariant under any permutation of X o, Xl, ... , XN-l, the coefficients of P(T) are symmetric polynomials in Xo, Xl, ... , X N - l . In fact, by
expanding P, we find out that the coefficient of TN - i is exactly (-1) i Vi. On
the other hand, by substituting 1 for k variables Xj and 0 for the rest, P(T)
becomes
which tells us that V'i(x) = (7) if i :::; k, and V'i(x) = 0 if i > k. This leads us
to an important observation which will be used later.
96
6. Complexity Lower Bounds for Quantum Circuits
Lemma 6.2.3. If P is a symmetric polynomial of N variables having complex (resp. real) coefficients and degree d, then there is a polynomial P w on
a single variable with complex (resp. real) coefficients such that
Pw(wt(x))
= P(x)
for each x E IF~.
Proof. Because P is symmetric, it can be represented as
P =
Co Vo
+ CI VI + ... Cd Vd ·
The claim follows from the fact that Vi(x)
having degree i.
= (wtd:ll l )
is a polynomial ofwt(x)
0
6.2.2 Bounds for the Representation Degrees
There are many complexity measures for a Boolean function: the number of
Boolean gates ...." 1\, and V needed to implement the function (Section 2.3.1)
and decision tree complexity, to mention a few. For our purposes, the ideal
complexity measure is the degree of the multivariate polynomial representing
the function. We follow the notations of [4J in the definition below.
Definition 6.2.2. Let B : IF~ -+ {a, I} be a Boolean function on N variables. The polynomial representation degree deg(B) of B is the degree of the
unique multivariate polynomial representing B.
For future use, we will define polynomial approximations of a Boolean function. Since the interesting results about approximating pertain to polynomials
with real coefficients, we will also restrict ourselves to polynomials with real
coefficients.
Definition 6.2.3. A multivariate polynomial P having real coefficients approximates a Boolean function B : IF~ -+ {a, I} if
IB(x) - P(x)1 ~
1
"3
for each x E IF~.
Clearly, there may be many polynomials which approximate a single Boolean
function B, but we are interested in the minimum degree of an approximating
polynomial. Therefore, following [4], we give the following definition.
Definition 6.2.4. The polynomial approximation degree of a Boolean function B is
deg(B)
= min{ deg(P) I P is a polynomial approximating B}.
6.2 Polynomial Representations
97
It is a nontrivial task to find bounds for the polynomial representation
degree, but we can derive a general bound for symmetric Boolean functions
(for more bounds, see and [4] and the references within). The following lemma
can be found in [4]:
Lemma 6.2.4. Let B : IFf -+ {O, 1} be a non-constant symmetric Boolean
function. Then deg(B) :?: N/2 + 1.
Proof. Let P be the symmetric multivariate polynomial of deg(B) that represents B. By Lemma 6.2.3, there is a single-variate polynomial Pw such that
Pw(wt(x)) = P(x) for each x E IFf. As a representation of a Boolen function
B, P( x) is either 0 or 1, and since there are N + 1 possible Hamming weights
(from 0 to N), and P is not constant, Pw(wt(x)) = 0 for at least N/2 + 1
integers or Pw(wt(x)) = 1 for at least N/2 + 1 integers wt(x). In the former
case, we conclude that deg( Pw) :?: N /2 + 1, and in the latter, we conclude
that deg( Pw) = deg( Pw - 1) :?: N /2 + 1. Altogether, we find that
deg(B)
= deg(P) = deg(Pw)
:?: N/2 + 1.
o
For some particular functions, we can, of course, get better results than
the previous theorem gives. Consider, for instance, function AND : IFf -+
{0,1} defined as AND(x) = 1 if and only if wt(x) = N. Clearly AND is a
symmetric function, and the corresponding single-variate polynomial AND w
representing AND has N zeros {O, 1, ... ,N -1}. Therefore, deg(AND) = N.
Using a similar argument, we also find that deg(OR) = N, where OR is a
function taking a value of 0 if and only if wt(x) = O.
It appears much more difficult to find bounds for the degree of an approximating polynomial. R. Paturi has proved [50] the following theorem
which characterizes the degree of a polynomial approximating symmetric
Boolean functions. To state Paturi's theorem, we fix some notations. Let
B : IFf -+ {0,1} be a symmetric Boolean function. Then B can be represented as a single-variate polynomial Bw(wt(x)) = B(x) depending only on
the Hamming weight of x. Let
r(B) = min{12k - N
+ 111 Bw(k) =f Bw(k + 1)
and O::S k
::s N
-1}.
Thus, r(B) measures how close to weight N /2 function Bw changes value: if
Bw(k) =f Bw(k + 1) for some k approximately N/2, then r(B) is low. Recall
that Bw(k) =f Bw(k + 1) means that, if wt(x) = k, then value B(x) changes
if we flip one more coordinate of x to 1.
Example 6.2.1. Consider function OR, whose only change occurs when the
weight of the argument increases from 0 to 1. Thus, r(OR) = N -1. Similarly
we see that the only jump for function AND occurs when the weight of x
increases from N -1 to N. Therefore, we have that r(AND) = N - 1.
98
6. Complexity Lower Bounds for Quantum Circuits
Example 6.2.2. Let PARITY : IF~ -+ {a, I} be a function defined as
PARITY(x) = 1, ifwt(x) is divisible by 2 and 0 otherwise. Function PARITY
always changes its value when the Hamming weight of x increases. For our
choice N = 2n is even, so r(PARITy) = 1, but the theorem also holds if N
is odd, and then r(PARITY) = 0.
Function MAJORITY : IF~ -+ {a, I} is defined to take a value of 0, if
wt(x) < N /2, and 1 if wt(x) 2: N /2. For even N, the only jump occurs when
the weight of x passes from N/2 - 1 to N/2, and for odd N when wt(x)
increases from (N - 1)/2 to (N + 1)/2; so, r(MAJORITY) = 1 for even N
and for odd N.
°
Theorem 6.2.1 (R. Paturi, [50]). Let B : IF;' -+ {O, I} be a non-constant
symmetric Boolean function. Then
deg(B)
= 8( IN(N - r(B))).
Earlier, we saw that deg(OR) = deg(AND) = N, which means that
an exact representation of OR and AND requires a polynomial of degree
N. On the other hand, by using previous examples and Paturi's theorem,
we find that a polynomial approximating these functions may have essentially lower degree: deg(OR) = 8( ffi), and deg(AND) = 8( ffi). However,
deg(PARITY) = 8(N) and deg(MAJORITY) = 8(N), so even an approximating polynomial for these two latter functions has degree Jl(N).
6.3 Quantum Circuit Lower Bound
6.3.1 General Lower Bound
We are now ready to present the idea of [4], which connects quantum circuits
computing Boolean functions and polynomials representing and approximating Boolean functions.
We consider black box functions f : IF;' -+ IF 2, and as earlier, quantum
blackbox queries are modelled by using a query operator Q j operating as
Qj Ix) Ib) = Ix) Ib EB f(x)) .
There may also be more qubits for other computational purposes, but without
violating generality, we may assume that a black box query takes place on fixed
qubits. Therefore, we can assume that a general state of a quantum circuit
computing a property of f is a superposition of states of form
Ix) Ib) Iw) ,
(6.3)
where the first register has n qubits, which are used as source bits of a
blackbox query, where b E IF 2 is the source qubit, and where w is a string
6.3 Quantum Circuit Lower Bound
99
of some number, say m qubits, which are needed for other computational
purposes. There are 2n +m +l different states (6.3), and we can expand the
definition of Qf to include all states (6.3) by operating on Iw) as an identity
operator; but, we will just denote the enlargened operator again by Qf:
Qf Ix) Ib) Iw) = Ix) Ib EB f(x)) Iw).
We begin by finding the eigenvectors of Qf. It is easy to verify that, if s is
either 0 or 1, then
Qf( Ix) 10) Iw) + (-IY Ix) 11) Iw))
= (_I)f("')'8 (Ix) 10) Iw) + (_1)8 Ix) 11) Iw) ),
(6.4)
which means that
Ix) 10) Iw)
+ (_1)8 Ix) 11) Iw )
(6.5)
is an eigenvector of Qf belonging to an eigenvalue (-1 )f("')·8. On the other
hand, vector (6.5) can be chosen in 2n +m +l different ways and it is simple to
verify that they form an orthogonal set, so (6.5) are all the eigenvectors of
Qf. It is clear that if the quantum circuit uses query operator Qf, the state
of the quantum circuit depends on the values of f, but equation (6.4) tells us
the dependence in a more explicit manner: Qf introduces a factor (-I)f("')'s,
which has a polynomial representation 1 - 2sf(x). The next lemma from [4]
expresses this dependency more precisely.
Lemma 6.3.1. Let Q be a quantum circuit that makes T queries to a blackbox function f : IF2 -+ IF 2. Let N = 2n and identify each binary string in IF2
with the number it represents. Then the final state of the circuit is a superposition of states (6.3), whose coefficients are functions of values Xo = f(O),
Xl = f(I), ... , XN-I = f(N -1). These functions have a polynomial representation degree of at most T.
Proof. The computation of Q can be viewed as a sequence of unitary transformations
Uo, Qf, UI , Qf, U2,··., UT-I, Qf, UT
in a complex vector space spanned by all states (6.3). Operators Ui are fixed
but Qf' of course, depends on the particular black box function f, so the
coefficients are also functions of X o, Xl, ... , XN-I. In the beginning of the
previous section we learned that each such function has a unique polynomial
representation. We prove the claim on degree by induction. Before the first
query, the state of the circuit is clearly a superposition of states (6.3) with
constant coefficients. Assume, then, that after k queries and after applying
operator Uk, the circuit state is a superposition
L L L
"'ElF;' bEIF2 wEIF;n
P(""b,w)
Ix) Ib) Iw) ,
(6.6)
100
6. Complexity Lower Bounds for Quantum Circuits
where each P(oo,b,w) is a polynomial of variables X o, Xl, ... , X N - I having
a degree of at most k. Then k + lth query will translate Ix) 10) Iw) into
Ix) 11) Iw) and vice versa if Xoo = f(x) = 1, but this translation will not
happen if Xoo = f(x) = 0 (again, we interprete x as a binary representation
of an integer in [0, N - 1]). This is to say that a partial sum
p(oo,O,w) Ix) 10) Iw)
+ P(oo,l,w) Ix) 11) Iw)
(6.7)
in (6.6) will transform into
(XooP(:z:,I,w)
+ (XooP(oo,O,w)
+ (1 + (1 -
Xoo)p(oo,O,w)) Ix) 10) Iw)
XOO)P(oo,l,w)) Ix) 11) Iw).
(6.8)
Using the assumption that deg P(oo,b,w) ::::: k, (6.8) directly tells us that, after
k + 1 queries to Qf' the coefficients are polynomials of degree at most k + l.
Operator Uk+! produces a linear combination of these coefficient functions,
D
which cannot increase the representation degrees.
Remark 6.3.1. The transformation from (6.7) to (6.8) caused by Qf is, of
course, exactly the same as (6.4), only it is expressed in a different basis. By
utilizing basis (6.5), Farhi & al. [29] have, independently from [4], derived
a lower bound for the number of quantum queries needed to approximate
PARITY.
Remark 6.3.2. Notice that the proof of Lemma 6.3.1 does not utilize the fact
that each Ui is unitary; but rather, only the linearity is needed. In fact, if we
could use nonlinear operators as well, reader may easily verify that we could
have much faster growth in the representation degree of coefficients.
The following theorem from [4] finally connects quantum circuits and
polynomial representations.
Theorem 6.3.1. Let N = 2n , f : 1F2 --+ 1F2 an arbitrary blackbox function
and Q a quantum circuits that computes a Boolean function B on N variables
Xo = f(O), Xl = f(I), ... , X N - I = f(N - 1).
1. If Q computes B with a probability of 1, using T queries to f, then
T;::: deg(B)/2.
2. If Q computes B with ~ correctness probability of at least ~ using T
queries to f, then T ;::: deg(B)/2.
Proof. Without loss of generality, we may assume that Q gives the value of
B by setting the right-most qubit to B(XO,XI, ... ,XN-I), which is then
observed. By Lemma 6.3.1, the final state of Q is a superposition
LL L
ooEiF2' bEiF2 wEiF2'
P(oo,b,w) Ix) Ib) Iw) ,
6.3 Quantum Circuit Lower Bound
101
where each coefficient PCa>,b,w) is a function of X o, ... , X N - 1 , which has a
polynomial representation with a degree of at most T. The probability of
seeing 1 on the right-most qubit is given by
(6.9)
where the sum is taken over all those states where the right-most bit of w
is 1. By considering separately the real and imaginary parts, we see that
P(Xo, ... ,XN- 1 ), in fact, can be represented as a polynomial of variables
X o, ... , XN-l, with real coefficients and degree at most 2T.
If Q computes B exactly, then (6.9) is 1 if and only if B is 1, and, therefore,
P(Xo, ... ,XN-l) = B. Thus, 2T 2: deg(P) = deg(B), and (1) follows. If Q
computes B with a probability of at least ~,then (6.9) is at most apart from
1 when B is 1, and similarly, (6.9) is at most apart from 0 when B is~This
means that (6.9) is a polynomial approximating B, so 2T 2: deg(P) 2: deg(B),
and (2) follows.
0
i
i
6.3.2 Some Examples
We can apply the main theorem (Theorem 6.3.1) of the previous section by
taking B = OR to arrive at the following theorem.
Theorem 6.3.2. If Q is a quantum algorithm that makes T queries to a
blackbox function f and decides with a correctness probability of at least ~ if
there is an element x E IF2' such that f(x) = 1, then T 2: cV2n, where c is a
constant.
Proof. To decide if f(x) = 1 for some x E IF2' is to compute function OR
on values f(O), f(l), ... , f(2 n - 1). According to Theorem 6.2.1, OR on
N = 2n variables has approximation degree of n( VN), so, by Theorem 6.3.1,
computing OR on N = 2n variables (with a bounded probability of error)
requires n( VN) = n( V2n) queries to f.
0
The above theorem shows that the method of using Grover's search algorithm to decide whether a solution for f exists, is optimal up to a constant
factor. Similar results can be derived for other functions whose representation
degree is known.
Theorem 6.3.3. To decide if a blackbox function f : IF2' -+ IF 2 has an even
number of solutions, it requires n(2n) queries to f for any quantum circuit.
Proof. Take B = PARITY, apply Paturi's theorem (Theorem 6.2.1), and use
reasoning similar to the above proof.
0
Remark 6.3.3. In [4], the authors also derive results analogous to the previous
ones for so-called Las Vegas quantum circuits, which must always give the
102
6. Complexity Lower Bounds for Quantum Circuits
correct answer, but that can, sometimes, be "ignorant", i.e., give the answer
"I do not know", with a probability of at most
Moreover, in [4] it is shown
that if a quantum circuit computes a Boolean function B on variables X o,
Xl, ... , XN-l using T queries to 1 with a nonvanishing error probability,
then there is a classical deterministic algorithm that computes B exactly and
uses O(T6) queries to I. Moreover, if B is symmetric, then O(T6) can even
be replaced with O(T2).
!.
Remark 6.3.4. It can be shown that only a vanishing ratio of Boolean functions on N variables have representation degree lower than N and that the
same holds true for the approximation degree [1]. Thus, for almost any B, it
requires n(N) queries to 1 to compute B on 1(0), 1(1), ... , I(N - 1) even
with a bounded error probability.
Remark 6.3.5. With a little effort, we can utilize the results represented in
this section to about oracle Turing Machine computation: for almost all oracles X, Np x is not included in BQPx. However, this does not imply that
NP is not included in BQP, but it offers some reason to believe so. On the
other hand, blackbox functions are the most "unstructured" examples that
one can have - in [21] W. van Dam has an example on breaking the blackbox
lower bound by replacing an arbitrary 1 with a more structured function.
7. Appendix A: Quantum Physics
7.1 A Brief History of Quantum Theory
We may say that quantum mechanics was born in the beginning of the twentieth century when experiments on atoms, molecules and radiation physics
were not explained by classical physics. As an example, we mention the quantum hypothesis by Max Planck in 1900 1 in connection with the radiation of
a black body.
A black body is a hypothetical object that is able to emit and absorb
electromagnetic radiation of all possible wavelengths. Although we cannot
manufacture ideal black bodies, for experimental purposes, a small hole in
an enclosure is a good approximation of a black body. Theoretical considerations revealed that the radiation intensity on different frequencies 2 should
be the same for all black bodies. In the nineteenth century, there were two
contradictory theories which described how the curve depicting this intensity should look: the first one was Wien's radiation law; the second one was
Rayleigh-Jean's law of radiation. Both of them failed to predict the observed
frequency spectrum, i.e., the distribution describing how the intensity of the
radiation depends on the frequency.
The spectrum of the radiation became understandable when Planck published his radiation law [69]. An especially remarkable feature of Planck's
work was that his law of radiation was derived under the assumption that
electromagnetic radiation is emitted in discrete quanta whose energy E is
proportional to the frequency v:
E= hv.
(7.1)
The constant h was eventually uncovered by studying the e.xperimental data.
This famous constant is called Planck's constant and its value is approximately given by
h = 6.62608 . 10- 34 J s.
1
2
(7.2)
M. Planck introduced his ideas in a lecture at the German Physical Society on
December 14, 1900. As a general refrerence to early works on quantum physics,
we mention [69].
We could just as well talk about wavelength A instead of the frequency v; these
quantities are related by formula VA = c, where c is the vacuum speed of light.
104
7. Appendix A: Quantum Physics
Planck's quantum hypothesis contains a somewhat disturbing idea: since the
days of Galileo and Newton, there had been a debate in the scientific community on whether light should be regarded as a particle flow or as waves. This
discussion seemed to cease in the nineteenth century, mainly for the following
reasons. First, in the beginning the nineteenth century, Young demonstrated
by his famous two-slit experiment that light inherently has an undulatory
nature. The second reason was that research at the end of the nineteenth
century showed that light is, in fact, electromagnetic radiation, which is better described as waves. The research mentioned was mainly carried out by
Maxwell and Hertz.
At least implicitly, Planck again called upon the old question about the
nature of the light: the explanation of black body radiation, which finally
agreed with observations in accordance with measurement precision, was derived under a hypothesis that light is made of discrete quanta.
Later, in 1905, Albert Einstein explained the photoelectric effect, basing
the explanation on Planck's quantum hypothesis [69]. The photoelectric effect means that negatively charged metal loses its electrical charge when it
is exposed to electromagnetic radiation of a certain frequency. It was not
understood why the photoelectric effect so fundamentally depended on the
frequency of the radiation and not on the intensity. In fact, reducing the
intensity will slow down the effect, but cutting down the frequency will completely stop the effect. Einstein pointed out that, under Planck's hypothesis
(7.2), this dependency on frequency becomes quite natural: the radiation
should be regarded as a stream of particles whose energy is proportional to
the frequency. It should be then quite clear that increasing the frequency of
the radiation will also increase the impact of the radiation quanta on the
electrons. When the impact becomes large enough, the electrons fly off and
the negative charge disappears. 3 Einstein also introduced the notion of the
photon, a light quantum.
Planck's radiation law and Einstein's explanation for photoelectric effect
opened the door for a novel idea: electromagnetic radiation, that was traditionally treated as a wave movement, possesses some characteristic features
of a particle flow.
The experimental energy spectrum of a hydrogen atom was explained by
Niels Bohr [69] up to the measurement precision of 1912. The model concerning hydrogen atoms introduced by Bohr provided some evidence that
electrons, which had been considered as particles, may have wave-like characteristics as well. More evidence of the wave-like characteristics of electrons
came to light in the interference experiment carried out by C. J. Davidsson
and L. H. Germer in 1927.
3
It should be emphasized here that the explanation given by Einstein is represented here only as a historical remark. According to modern quantum physics,
the photoelectric effect can also be explained by assuming the undulatory nature
of light and the quantum mechanical nature of the matter.
7.2 Mathematical Framework for Quantum Theory
105
Inspired by the previous results, Luis de Broglie introduced a general hypothesis in 1924: particles may also be described as waves4 whose wavelength
>. can be uncovered when the momentum is known:
(7.3)
where h is Planck's constant (7.2).
Other famous physicists who have greatly influenced the development of
quantum physics in the form we know it today are W. Heisenberg, M. Born,
P. Jordan, W. Pauli, and P. A. M. Dirac.
7.2 Mathematical Framework for Quantum Theory
In this section, we shall first introduce the formalism of quantum mechanics
in a basic form based on state vectors. Later we will also study a more general
formalism based on self-adjoint operators, and we will see that the state vector
formalism can be produced as a restriction of the more general formalism.
The advantage of the state vector formalism is that it is mathematically
simpler than the more general one.
In connection with quantum computation, we are primarily interested in
representing a finite set 5 by using a quantum mechanical system. Therefore,
we will make another significant mathematical simplification by assuming
that all the quantum systems handled in this chapter are finite-dimensional,
unless explicitly stated otherwise.
In the introductory chapter, we stated that we can describe a probabilistic
system (with a finite phase space) by using a probability distribution
(7.4)
over all the configurations Xi. In the above mixed state, PI + ... + Pn = 1
and the system can be seen in state Xi with a probability of Pi. A quantum
mechanical counterpart of this system is called an n-level quantum system. To
describe such a system, we choose a basis IXI), ... , IXn) of an n-dimensional
Hilbert space H n , and a general state of an n-Ievel quantum system is described by a vector
(7.5)
4
5
"There are not two worlds, one of light and waves, one of matter and corpuscles.
There is only a single universe. Some of its properties can be accounted for by
wave theory, and others by the corpuscular theory." (A citation of the lecture
given by Louis de Broglie at the Nobel prize award ceremony in 1929.)
In formal language theory, we also call a finite set representing information an
alphabet.
106
7. Appendix A: Quantum Physics
is a complex number called the amplitude of Xi and lal1 2 +
... + lan l2 = 1. The basis IXI), ... , IXn) refers to an observable that can have
some values, but for a moment we will ignore the values and say that the
system may have properties Xl, ... , Xn (with respect to the basis chosen). The
probability that the system is seen to have property Xi is lail 2 . Representation
(7.5) has some properties that (7.4) does not have. For example, the time
evolution of (7.4) sends each basis vector again into a combination of all the
basis vectors with non-negative coefficients that sum up to 1. On the other
hand, (7.5) may evolve in a very different way, as the basis states can also
cancel each other.
where each
ai
Example 7.2.1. Consider a quantum physical description of two-state system
having states 0 and 1. Assume also that the time evolution of the system
(during a fixed time interval) is given by
1
1
1
1
10)
H
)210) + )2 11) ,
11)
H
)210) - )21 1).
If the system starts in state 10) or 11) and undergoes the time evolution, the
probability of observing 0 or 1 will be ~ in both cases. On the other hand,
if the system starts at state 10) and undergoes the time evolution twice, the
state will be
1 1
1
1 1
1
)2()21 0)+ )21 1))+ )2()21 0)- )211))
1
1
1
1
= 2"1 0) + 2"1 1) + 2"1 0) - 2"11) = 10) ,
and the probability of observing 0 becomes 1 again. The effect that the amplitudes of 11) cancel each other, is called destructive interference and the effect
that the coefficients of 10) amplify each other is called constructive interference. Destructive interference cannot occur in the evolution of the probability
distribution (7.4) since all the coefficients are always non-negative real numbers. A probabilistic counterpart to the quantum time evolution would be
1
1
1
1
[0]
H
2"[0] + "2[1],
[1]
H
2"[0] + "2[1],
but the double time evolution beginning with state [0] would give state ~ [0] +
~ [1] as the outcome.
Amplitude distribution (7.5) can naturally be interpreted as a unit-length
vector of a Hilbert space. Therefore, the following sections are devoted to the
study of Hilbert spaces (see also Section 8.3).
7.2 Mathematical Framework for Quantum Theory
107
7.2.1 Hilbert Spaces
A finite-dimensional Hilbert space H is a complete (completeness will be
defined later) vector space over complex numbers which is equipped with an
inner product H x H -+ C, (x, y) t-t (x I y). All n-dimensional Hilbert spaces
are isomorphic, and we can, therefore, denote any such space by Hn. An inner
product is required to satisfy the following axioms for all x, y, z E Hand
Cl, C2 E C:
1. (x I y) = (y I x)*.
2. (x I x) ;::: 0 and (x I x) = 0 if and only if x = O.
3. (x I ClY + C2Z) = Cl(X I y} + C2(X I z}.
If E = {el' ... , en} is an orthonormal basis of H, each vector x E H can
be represented as x = Xlel + ... xne n . With a fixed basis E, we also use
coordinate representation x = (Xl! . .. ,xn ). Using this representation one can
easily see that an orthonormal basis induces an inner product by
(x I y) = XiYl
+ ... x~Yn.
(7.6)
On the other hand, each inner product is induced by some orthonormal basis
as in (7.6). For, if (. I .) stands for an arbitrary inner product, we may
use the Gram-Schmidt process (see Section 8.3) to find a basis {b l , ... , bn }
orthonormal with respect to (. I .). Then
(x I y) = (xlbl
=
+ ... + xnbn I ylbl + ... Ynbn)
XiYl + ... + x~Yn
The inner product induces a vector norm in a very natural way:
Ilxll=~·
Informally speaking, the completeness of the vector space H means that
there are enough vectors in H, i.e., at least one for each limit process:
Definition 7.2.1. A vector space H is complete, if for each vector sequence
Xi such that
lim
m,n-+oo
IIx.".. -
xnll = 0,
there exists a vector x E H such that
lim IIxn -
n-+oo
xII
= O.
An important geometric property of Hilbert spaces is that each subspace
W E H which is also a Hilbert space,6 has an orthogonal complement. The
proofs of the following lemmata are left as exercises.
6
In finite-dimensional Hilbert spaces, all the subspaces are also Hilbert spaces.
108
7. Appendix A: Quantum Physics
Lemma 7.2.1. Let W be a subspace of H. Then the set of vectors
W.L = {y
E
H I (y I x) =
0 whenever x E
H.}
is also a subspace of H, which is called the orthogonal complement of w.
Lemma 7.2.2. If W is a subspace of finite-dimensional Hilbert space H,
then H = W EEl W.L .
7.2.2 Operators
The modern description of quantum mechanics is profoundly based on linear mappings. In the following sections, we represent the features of linear mappings which are most essential for quantum mechanics. Because we
mainly concentrate on finite-level quantum systems, the vector spaces that
are treated hereafter will be assumed to have a finite dimension, unless explicitly stated otherwise.
Let us begin with some terminology: a linear mapping H -+ H is called
an operator. The set of operators on H is denoted by L(H). For an operator
T, we define the norm of the operator by
IITII =
sup IITxll·
11"'11=1
A nonzero vector x E H is an eigenvector of T belonging to eigenvalue A E C,
ifTx = AX.
Definition 7.2.2. For any operator T : H -+ H, the adjoint operator T* zs
defined by the requirement
(x I Ty) = (T*x I y)
for all x, y E H.
Remark 7.2.1. With a fixed basis {el, ... ,e n } of H, any operator T can be
represented as an n x n matrix over the field of complex numbers. It is not
difficult to see that the matrix representing the adjoint operator T* is the
transposed complex conjugate of the matrix representing T.
It can be shown that the quantity
n
Tr(T) =
L (ei I Tei)
(7.7)
i=l
is independent of the choice of the orthonormal basis (Exercise 1). Quantity
(7.7) is called the trace of operator T. By its very definition it is clear that
the trace is linear. Notice that in the matrix representation of T, the trace is
the sum of the diagonal elements.
7.2 Mathematical Framework for Quantum Theory
109
Definition 7.2.3. An opemtor T is self-adjoint if T* = T. An opemtor T
is unitary if T* = T-l .
The following simple lemmata will be used frequently.
Lemma 7.2.3. A self-adjoint opemtor has real eigenvalues.
Proof. If Ax = >.x, then
>'*(x I x)
Since x
= (>.x I x) = (Ax I x) = (x I Ax) = >.(x I x).
o
:I 0 as an eigenvector, if follows that >. * = >..
Lemma 7.2.4. The eigenvectors of a self-adjoint opemtor belonging to distinct eigenvalues are orthogonal.
Proof. Assume that>. :I >.', Ax = >.x, and Ax' = >.'x'. Since>. and>.' are
real by the previous lemma,
>.' (x' I x)
= (Ax' I x) = (x' I Ax) = >.(x' I x),
and therefore (x' I x) =
o.
o
A self-adjoint operator T is positive if (x I Tx) ~ 0 for each x E H.
If W is a subspace of Hand H = W EEl W.L, then each vector x E H can
be uniquely represented as Xw + Xw J., where Xw E Wand Xw J. E W.L. It is
quite easy to verify that the mapping Pw defined by Pw (xw + Xw J.) = Xw
is self-adjoint linear mapping, and is called the projection onto subspace W.
Clearly,
= Pw . On the other hand, it can be shown that each self-adjoint
P such that p2 = P is a projection onto some subspace of H (Exercise 2).
Notice that a projection can never increase the norm:
par
The second-last equality, which is called Pythagoms' theorem, follows from
the fact that Xw and XWT are orthogonal. Projections play an important
role in the theory of self-adjoint operators and so we will study them in more
detail.
7.2.3 Spectral Representation of Self-adjoint Operators
For any vectors x and y E Hn, we define a linear mapping
by
Ix)(yl z
= (y
Ix)(yl: Hn -+ Hn
I z)x.
It is plain to see that, if IIxll = 1, then I x)(x I is the projection onto the
one-dimensional subspace generated by x. To introduce the spectral decomposition of a self-adjoint operator T, we present the following well-known
result:
110
7. Appendix A: Quantum Physics
Theorem 7.2.1. Let T : Hn -t Hn be a self-adjoint operator. There exists
an orthonormal basis of Hn that consists of eigenvectors of T.
Proof. The proof is made by induction on n = dim Hn. One-dimensional Hl
is generated by some single vector el of unit length. Clearly Tel = Ael for
some A E C, so the claim is obvious. If n > 1, let A be an eigenvalue of T
and let W be the eigenspace of A, that is, the subspace of Hn generated by
the eigenvectors belonging to A. Any linear combination of the eigenvectors
belonging to A is again an eigenvector belonging to A, which implies that W
is closed under T. But also, the orthogonal complement W.L is closed under
T: in fact, take any x E Wand yEW.L. Then
(x I Ty)
= (Tx I y) = A*(X I y) = 0,
which means that Ty E W.L as well. Therefore, we may study the restrictions
T : W -t Wand T : W.L -t W.L and apply the induction hypothesis to find
an orthonormal basis for Wand W.L consisting of eigenvectors of T. Since
Hn = W EB W.L, the union of these bases satisfies the claim.
0
We can now introduce spectral representation, a powerful tool in the study
of self-adjoint mappings. Let X1, ... , Xn be orthonormal eigenvectors of a
self-adjoint operator T : Hn -t Hn and let A1, ... , An be the corresponding
eigenvalues. Numbers Ai are real by Theorem 7.2.1, but they are not necessarily distinct. The set of eigenvalues is called a spectrum and it can be easily
verified (Exercise 3) that
(7.8)
Decomposition (7.8) is called a spectral representation of T. If T has multiple
eigenvalues, we say that T is degenerate; otherwise, T is non-degenerate. The
spectral representation (7.8) of a non-degenerate T is unique, as is easily
verified. Notice that we do not claim that the vectors Xi are unique: only
that the projections I Xi)(Xi I are. If T is degenerate, we can collect the
multiple eigenvalues in (7.8) as common factors to obtain a representation
(7.9)
where P l , ... , Pnl are the projections onto the eigenspaces of A~, ... , A~/. It
is easy to see that the spectral representation (7.9) is unique. 7
Recall that all the eigenvectors belonging to distinct eigenvalues are orthogonal, which implies that, in representation (7.9), all the projections are
projections to mutually orthogonal subspaces. Therefore, PiPj = 0 whenever
i i:- j. It follows that, if p is a polynomial, then
p(T) = p(A~)Pl
7
+ ... + p(A~' )Pnl.
(7.10)
Some authors call only the unique representation (7.9) a spectral representation,
not representation (7.8).
7.2 Mathematical Framework for Quantum Theory
111
We also generalize (7.10): if f : lR ~ C is any function and (7.9) is the
spectral representation of T, we define
f(T) = f(A~)P1
+ ... + f(A~' )Pn ,.
(7.11)
Example 7.2.2. Matrix
M~ = (~~)
defines a self-adjoint mapping in H2 as is easily verified. Matrix M~ is nondegenerate having 1 and -1 as eigenvalues. The corresponding orthonormal
eigenvectors can be chosen as Xl = ~(1, I)T and X-I = ~(1, -If, for
example. The matrix representations of both projections IX±l) (x±ll can be
easily uncovered: they are
(tt)
and ( -
t-t)
respectively. Thus, the spectral representation of
( 01)
10
= 1·
(~~ ~)
~ -
1·
M~
is given by
(~
_~ -~)
~ .
We can now find the square root of
M~:
JM~ = Vi· (t t) +H' (-t -t)·
n
Choosing VI = 1,
= i we obtain the matrix of Example 2.2.1. By fixing
the square roots above in all possible ways, we get four different matrices X
that satisfy the condition X 2 = M~.
7.2.4 Spectral Representation of Unitary Operators
Let us study a function e iT , which is defined on a self-adjoint operator defined
by spectral representation
T = Al IX1)(x11 + ... + An IXn)(Xn I.
By definition,
eiT=eiAllx1)(X11+···+eiA" IXn)(xnl,
and we notice that
(e iT )* = e- iA1 1Xl) (XII + ... + e- iAn Ixn)(xn 1= (e iT )-l,
which is to say that eiT is unitary. We will now show that each unitary
mapping can be represented as e iT , where T is a self-adjoint operator. To do
this, we first derive an auxiliary result which also has independent interest.
7. Appendix A: Quantum Physics
112
Lemma 7.2.5. Self-adjoint operators A and B commute if and only if A
and B share an orthonormal eigenvector basis.
Proof. If {a1,"" an} is a set of orthonormal eigenvectors of both A and B,
then, by using the corresponding eigenvalues, we can write
and
Hence,
Assume, then, that AB = BA. Let A1, ... , Ah be all the distinct eigenvalues
of A and let aik), ... , a~~ be orthonormal eigenvectors belonging to Ak. For
any a~k), we have
ABa(k)
= BAa(k)
1.
1.
= BAka(k)
1.
= AkBa(k)
t
,
i.e., Ba~k) is also an eigenvector of A belonging to eigenvalue Ak. Therefore,
the eigenspace of Ak is closed under B. We denote this subspace by Wk. As a
restriction of a self-adjoint operator, clearly B : W k --t W k is also self-adjoint.
Therefore, B has mk eigenvectors bik) , ... , b~~ that form an orthonormal
basis of Wk. By finding such a basis for each Wk, we obtain an orthonormal
eigenvector system such that:
• Each b~k) is an eigenvector of B and A (for A, b~k) is an eigenvector belonging to Ak).
• If i =I j, then b~k) and b;k) are orthogonal, since they belong to an orthonormal basis generating Wk.
• If k =I k', then b~k) and b;k') are orthogonal, since they are eigenvectors of
A belonging to distinct eigenvalues Ak and Ak',
Thus, the system which is obtained is an orthonormal basis of Hn but also a
set of eigenvectors of both A and B.
0
Now let now U be a unitary operator. We notice, first, that the eigenvalues
of U have absolute values of 1. To verify this, let x be an eigenvector belonging
to eigenvalue A, i.e., U x = AX. Then
(x
1
x) = (x 1 U*Ux) = (Ux 1 Ux) = (Ax 1 AX) =
IAI2 (x
1
x),
and it follows that IAI = 1.
We decompose U in "real and imaginary parts" by writing U = A + iB,
where A = ~(U + U*) and B = ~(U - U*). Note that A and B are now
7.2 Mathematical Framework for Quantum Theory
113
self-adjoint, commuting operators. According to the previous lemma, we have
spectral representations
and
Since the eigenvalues of U are of absolute value 1, it follows that the eigenvalues of A and B, i.e., numbers Ai and ILi have absolute values of at most l.
But since A and B are self-adjoint, numbers Ai and ILi are also real. Thus, U
can be written as
where Aj and ILj are real numbers in interval [-1,1]. Thus, there exists a
unique () j E [0, 27r) for each j such that Aj = cos ()j and ILj = sin ()j. It follows
that Aj + iILj = eiOj and U can be expressed as U = e iH , where
Representation
is called the spectral representation of unitary operator U. Operator H (or
-H) is sometimes called the Hamilton operator which induces U. Notice also
that the way we derived the spectral representation of a unitary operator gives
us, as a byproduct, the knowledge that a unitary matrix has eigenvectors that
form an orthonormal basis of Hn.
The spectral representation for a unitary operator could, of course, have
been derived directly without using decomposition U = A + iB. Anyway, the
spectral representation is useful when decomposing a unitary matrix into a
product of simple unitary matrices. Notice also that if T1 and T2 commute,
then clearly eiCT1 +T2) = eiT1 eiT2 .
In the following examples, we will illustrate how the unitary matrices and
the Hamiltonians which induce them are related.
Example 7.2.3. The Hadamard- Walsh matrix
W2 =
1 (11-11)
J2
is unitary but also self-adjoint. Thus, the decomposition W 2 = A + iB is just
W2 itself. Matrix W2 has two eigenvalues, -1 and 1, and the corresponding
orthonormal eigenvectors can be chosen as X-1 = ..j 1
(1 - J2, l)T and
4-2v'2
114
Xl
7. Appendix A: Quantum Physics
=
~(I +.J2, If. The corresponding projections can be expressed
4+2V2
as
and
The spectral representation of a self-adjoint mapping W 2 is, thus, given by
which can be also written as W2 = ei.O
W2 = e iT , where
1XI)(XI 1+e i. 1X-I)(X-I I,
7r
so
Example 7.2.4. Let 0 be a real number and
(COSO
R =
sin 0
(J
-sinO)
cos 0
be the rotation matrix. Clearly, R() is unitary. Now that we know that unitary
matrices also have eigenvectors forming an orthonormal basis, we can find
them directly without seeking for a decomposition R(J = A + iB. It is an
easy task to verify that the eigenvalues of R() are e±i(), and the corresponding
eigenvectors can be chosen as X+ = ~ (i, I) and x_ = ~ (-i, If. The
corresponding projections are given by
and
Ix-)(x-I=
Thus R()
H(J
Ii) .
(t-I
= ei() 1x+)(x+ 1+e- i () 1x_)(x_l= eiHo , where
=
0 iO)
( -iO
0 .
Example 7.2.5. Let a and (3 be real numbers. The phase shift matrix
7.3 Quantum States as Hilbert Space Vectors
115
is unitary, as is easily verified. The spectral decomposition is now trivial:
P.a,{3 = eia
(1000) + e (0010) '
i{3
7.3 Quantum States as Hilbert Space Vectors
Let us return to the study of finite-level quantum systems. The mathematical
description of such a system is based on an n-dimensional Hilbert space Hn.
We choose an orthonormal basis {Ixl) , ... , IXn)} and call vectors IXi) basis
states. A state of the system is a unit-length vector in Hn. States other than
the basis ones are called superpositions of the basis states. Two states Ix) and
IY) are equivalent if Ix) = e i8 lY) for some real (). Hereafter, we will regard
equivalent states as equal.
A general state
(7.12)
determines a probability distribution: if the system in state (7.12) is observed,
the system will be seen in a basis state IXi) with a probability of IO:iI 2 . The
coefficients O:i are called amplitudes. We will study the observations in more
detail in Section 7.3.2.
For now, we present a mathematical description of compound quantum
systems: suppose we have n- and m-state distinguishable systems8 with bases
{Ixl) , ... , IXn)} of Hn and {IY1),"" IYm)} of Hm respectively. The compound system which is made of these subsystems is described as tensor product Hn ® Hm ~ Hnm. The basis states of Hn ® Hm are
where i E {I, ... , n} and j E {I, ... , m}. We also use notations IXi) ® IYj) =
IXi) IYj) = lXi, Yj). A general state Iz) of the compound system is a unitlength vector in Hnm. We say that a state Iz) is decomposable, if
Iz) = Ix) IY)
for some states Ix) E Hn and IY) E Hm. A state that is not decomposable is
entangled. The inner product in space Hn ® Hm is defined by
8
The requirement that the systems are distinguishable is essential here. A compound system of two identical subsystems has a different description.
116
7. Appendix A: Quantum Physics
Example 7.3.1. A two-level quantum system is called a qubit. The orthonormal basis vectors chosen for H2 are usually denoted by 10) and 11) and are
identified with logical 0 and 1. A compound system of m qubits is called
a quantum register of length m and is described by a Hilbert space having
dimension 2m and basis
We can identify the binary sequence Xl, ... , xm and number x l 2m - 1 +
+ ... + x m - 1 2 + X m . Using this identification, the basis of H 2", can
be expressed as
X22m-2
{IO), 11), ... , 12m -I)}.
7.3.1 Quantum Time Evolution
Our next task is to describe how quantum systems change in time. For that
purpose, we will assume, as is usually done, that there is some function Ut :
Hn --+ Hn which depends on the time t, and describes the time evolution of
the system. 9 In other words, by denoting the state of the system at time t by
x(t),
x(t) = Utx(O).
We will now list some requirements that are usually put on the time-evolution
mapping Ut . The very first stress put on Ut is that it should map unit-length
states to unit-length states; that is, it should preserve the norm:
1. For each t E JR and each x E Hn, IIUtxl1
= Ilxll.
Notice that the above requirement is quite mathematical: it just states that,
if the coefficients of x determine a probability distribution as in (7.12), then
the coefficients of U x should do so as well. The second requirement, on the
other hand, is that the superposition (7.12) should strongly resemble the
probability distribution, so strongly that each Ut operates on (7.12) in such
a way that each basis state evolves independently:
2. For each t, Ut(al IXI) + ... + an
means that each Ut is linear.
Ix n )) =
aoUt IXI)
+ ... + anUt Ix n ). This
We could seriously argue about the above requirement,lO but that is not the
purpose of the present book. The next requirement is that Ut can always be
decomposed:
3. For all tl and t2 E JR, UtI +t2 = UtI Ut2 .
9
10
This assumption is referred to as the causality principle.
If a state (7.12) is not just a generalization of a probability distribution, why
should it evolve like one?
7.3 Quantum States as Hilbert Space Vectors
117
Finally, we require that the time evolution must be smooth:
4. For each to E JR., limt-+to Utx(O)
= limt-+to x(t) = x(to).
From the above requirements, we can derive the following characteristic result.
Lemma 7.3.1. A time evolution mapping Ut satisfying 1-3 is a unitary operator.
Proof. Condition 2 states that Ut is an operator, and from condition 1, it
follows directly that each Ut is injective. Requirement 3 implies that Uo = US.
But Uo Ix) = Us Ix) implies, by the injectivity, that Ix) = Uo Ix) for any Ix),
which is to say that Uo is the identity mapping. By again using requirement
3, we see that Ix) = UtU- t Ix), i.e., each Ut is also surjective. According to
requirement 1, each Ut preserves norm, and it follows that, by the polarization
equation (Exercise 4)
(7.13)
a linear mapping preserving the norm preserves the inner products, as well.
Therefore,
which means, by its very definition, that Ut is unitary.
o
Condition 4 is still unused. Its place is found after the following theorem by
M. Stone; see [49J.
Theorem 7.3.1. If each Ut satisfies 1-4, then there is a unique self-adjoint
operator H such that
Recall that the exponential function of a self-adjoint operator A can be defined by using the spectral representation (7.11), or even as series
e
A
1
1
=I+A+,A
+,A
+ ....
2.
3.
2
3
It is clear that the definitions coincide in a finite-dimensional Hn. By Stone's
theorem, the time evolution can be expressed as
x(t)
= e- itH x(O),
from which we get
118
7. Appendix A: Quantum Physics
:tX(t) = -iHe-itHx(O) = -iHx(t)
by component-wise differentiation. This can also be written as
.d
zdt x(t)
= Hx(t).
(7.14)
Differential equation (7.14) is known as the abstract Schrodinger equation,
and the operator H is called the Hamilton operator of the quantum system. In a typical situation, the Hamilton operator (which represents the total
energy of the system) is known, and the Schrodinger equation is used to determine the energy eigenstates, which form an example of the basis states of
a quantum system.
Remark 7.3.1. The time evolution which is described by operators Ut is continuous, but, in connection to computational aspects, we are interested in the
system state at discrete time points tl, t2, t3, .... Therefore, we will regard
the quantum system evolution as a sequence of unit-length vectors x, Ulx,
U2 UI X, U3 U2 UI X, ... , where each Ui is unitary.
7.3.2 Observables
We again study a general state
(7.15)
The basis states were introduced in connection with the distinguishable properties which we are interested in, and a general state, i.e., a superposition of
basis states, induces a probability distribution of the basis states: a basis
state Xi is observed with a probability of lai. This will be generalized as
follows:
Definition 7.3.1. An observable E = {E I , ... Em} is a collection of mutually orthogonal subspaces of Hn such that
Hn=EIEB ... EBEm.
In the above definition, inequality m ::; n must, of course, hold. We equip the
subspaces Ei with distinct real number "labels" (h, ... , em. For each vector
X E Hn, there is a unique representation
X
=
Xl
+ ... +X m
(7.16)
such that Xi E E i . Instead of observing spaces E i , we can talk about observing the labels i : we say that by observing E, value i will be seen with a
probability of Ilxi 112.11
e
11
e
The original starting point is, of course, controversial: instead of talking about
observing subspaces, one talks about measuring some physical quantity and obtaining a real number as the outcome. However, here it seems to be logically
more consistent to introduce these quantity values as labels of subspaces.
7.3 Quantum States as Hilbert Space Vectors
Example 7.3.2. The notion of observing the basis states
the observable E can be defined as
Xi
119
is a special case:
where each Ei is the one-dimensional subspace spanned by Xi. We can, for
instance, equip Ei with the label i. If the system is in state (7.15), the value
i is observed with a probability of
Another view of the observables can be achieved in the following way: if
Hn = E1 EB ... EB Em, let PEi be the projection on the subspaces Ei . Thus,
we may also think of observables as (sometimes partially defined) mappings
E : JR --t L(H) that associate a projection E(Oi) = PEi to each label Oi.
Following the notation (7.16),
and the probability of observing the label Oi when the system state is Ix), is
given by
Mapping E : JR --t L(H) has other interesting properties: the probability of
observing either Oi or OJ is clearly
and the probability that some
(}i
is observed is
1 = (x I Ix).
Thus, we may extend E to be defined on sets of real numbers by setting
E( {Oi, OJ}) = E(Oi) + E(Oj) if 0i =1= OJ. We can easily see that, as a mapping
from subsets of JR to L(H), E satisfies the following conditions:
1. For each X c JR, E(X) is a projection.
2. E(JR) = I.
3. E(UXi ) = E E(Xi ) for a disjoint sequence Xi of sets.
A mapping E that satisfies the above conditions is called a projection-valued
measure. 12 Viewing an observable as a projection-valued measure represents
12
The notion of a projection-valued measure is not, in general, defined on all subsets
of JR, but on a cr-algebra which has enough "regular features". Typically, the
Borel subsets of JR will suffice. However, now we are interested in finite-level
quantum systems, so there are only finitely many real numbers associated with
each observable.
120
7. Appendix A: Quantum Physics
an alternative way of thinking than Definition (7.3.1). In that definition, we
merely equipped each subspace with a real number, a label, which is thought
to be observed instead of the actual subspace. Here we associate a projection
which defines a subspace to a set of labels (real numbers). If X is a set of
real numbers, then (x I E(X)x) is just the probability of seeing a number in
the X.
It follows from Condition 3 that, if X and Yare disjoint sets, then the
corresponding projections E(X) and E(Y) project onto mutually orthogonal
subspaces. In fact, since E(X)(Hn) is a subspace of E(X U Y)(Hn), we must
have E(X)E(X U Y) = E(X) and; therefore, E(X) = E(X)E(X U Y) =
E(X) + E(X)E(Y), so E(X)E(Y) = o.
The third and perhaps the most traditional view of the observables can
be achieved by regarding an observable as a self-adjoint operator
where E(Oi) = E( {Oi}) are projections to mutually orthogonal subspaces.
Recalling that the probability of observing 0i when the system is in state x,
is (x I E(Oi)X), we can compute the expected value of an observable A in
state x. The expected value is:
lE",(A) = 0l(X I E(Odx) + ... + Om (x I E(Om)x)
= (x I (OlE(Od + ... + OmE(Om))x) = (x I Ax).
It now seems to be a good moment to collect together all the viewpoints
of observables which we have so far.
• An observable can be defined as a collection of mutually orthogonal subspaces {El , ... , Em} such that H = El EB ... EB Em. Each subspace is
equipped with a real number label Oi. This point of view most closely
resembles the original idea of thinking about a quantum state as a generalization of a probability distribution. In fact, observing the state of a
quantum system can be seen as learning the value of an observable, which
is defined as a collection of one-dimensional subspaces spanned by the basis
states.
• An observable can be seen as a projection-valued measure E which maps
a set of labels to a projection that defines a subspace. This viewpoint
offers us a method for generalizing the notion of an observable into positive
operator-valued measure.
• The traditional view of an observable is to define an observable as a selfadjoint operator by the spectral representation
All these viewpoints are logically quite equal. In what follows, we will use all
of them, choosing the one which is best-suited for each situation.
7.3 Quantum States as Hilbert Space Vectors
121
7.3.3 The Uncertainty Principles
In this section, we first present an interesting classical result connected to
the simultaneous measurements of two observables. We will view the observabIes as self-adjoint mappings. We say that two observables A and Bare
commutative, if AB = BA. We also define the commutator of A and B as a
mapping [A, BJ = AB - BA. Recall that, for a given state x, the expected
value of observable A is given by lE",(A) = (x I Ax). For short, we denote
f.-t = lE",(A). The variance of A (in state x) is the expected value of observable
(A - f.-t)2. Notice that, if A is self-adjoint, then (A - f.-t)2 is also self-adjoint,
and, thus, it also defines an observable. With these notations, the variance
can be expressed as
Var",(A) = lE",((A - f.-t)2) = (x I (A - f.-t)2X)
= ((A - f.-t)x I (A - f.-t)x) = II(A - f.-t)xI1 2 .
(7.17)
Theorem 7.3.2 (The Uncertainty Principle). For any observables A,
B and any state x,
Var",(A)Var",(B) ~
41 1(x I [A, BJx)1 2 .
Proof. Notice first that, for self-adjoint A and B, [A, BJ* = -[A, BJ, so the
commutator of A and B can be written as [A, BJ = iC, where C = -irA, B]
is self-adjoint. If f.-tA and f.-tB are the expected values of A and B respectively,
we get
Var",(A)Var",(B) = II(A - f.-tA)xI1 2 11(B - f.-tB)XW
2:: I((A - f.-tA)X I (B - f.-tB)X) 12
using representation (7.17) and the Cauchy-Schwartz inequality. To shorten
the notations, we write Al = A - f.-tA and BI = B - f.-tB. Clearly, Al and BI
are also self-adjoint and it is a straightforward task to verify that [AI, Bd =
[A, B] and that AIBI + BIAI is self-adjoint. Thus,
I(AIX I B l x)1 2 = I(x I AI B l x)1 2
= I(x I !(AIBI + BIAt}x + !(AIBI - B I At}x)1 2
1
= 4 1(x I (AIBI + BIAt}x) + (x I [A, B]x)1
1
2
2
= "4 1(x I (AIBI + BIAI)x) + i(x I Cx)1 ,
where C = -irA, B] is a self-adjoint operator. Because of the self-adjointness
of the above operators, both inner products are real. Therefore,
Var",(A)Var",(B)
~ ~ I(x I (AIBI + BIAt}x) + i(x I Cx)1 2
1
= 4( (x I (AIBI + BIAI)X)2 + (x I Cx)2)
1
2:: 4(x I Cx)2,
122
7. Appendix A: Quantum Physics
which can also be written as
1
Var",(A)Var",(B) ~ "4 1(x
I [A, B]x)12 ,
o
as claimed.
Remark 7.3.2. A classical example of noncommutative observables are position and momentum. It turns out that the commutator of these two observabIes is a homothety, i.e., multiplication by a nonzero constant, so Lemma
7.3.2 demonstrates that, in any state, the product of the variances of position
and momentum has a positive lower bound. This is known as Heisenberg's
uncertainty principle.
The uncertainty principle (Theorem 7.3.2) was criticized by D. Deutsch
[22] on the grounds that the lower bound provided by Theorem 7.3.2 is not
generally fixed, but depends on the state x. Deutsch himself in [22] gave
a state-independent lower bound for the sum of the uncertainties of two
observables. Here we present an improved version of this bound courtesy of
Maassen and Uffink [42].
We say that a sequence (PI. ... ,Pn) of real numbers is a probability distribution if Pi ~ 0 and PI + ... + Pn = 1. For a probability distribution
P = (PI, ... ,Pn), the Shannon entropy of distribution is defined to be
H(P) = -(pI log PI
+ ... + Pn 10gPn),
where 0 . log 0 is defined to be o. For basic properties of Shannon entropy, see
Section 8.5.
Let A and B be some observables of an n-state quantum system (again
regarded as self-adjoint operators). As discussed before, there are orthonormal bases of Hn consisting of the eigenvectors of A and B. By denoting these
bases by {aI, ... , an} and {bI. ... , bn }, we get spectral representations of A
and B:
A = Al lal)(all + ... + An I an) (an I,
B = /-Ll Ibl)(bll + ... + /-Ln Ibn)(bn I,
where Ai and /-Li are the eigenvalues of A and B, respectively.
With respect to a fixed state Ix) E H n , observables A and B define
probability distributions P = (PI, ... ,Pn) and Q = (ql, ... , qn) that describe
the probabilities of seeing labels AI, ... , An and /-Ll, •.. , /-Ln. Notice that the
projections E(Ai) and E(/-Li) are Iai) (ai I and Ibi ) (b i I respectively. Therefore,
by measuring observable A when the system is in state Ix), label Ai can be
observed with a probability of
Pi
= (x II ai)(ai I x) = (x I (ai I x)ai) = I(ai I x)1 2 ,
and by measuring B, label /-Li is seen with a probability of qi = I(bi I x)12.
The probability distributions P = (PI, ... ,Pm) and Q = (ql, ... , qm) obey
the following state-independent law:
7.3 Quantum States as Hilbert Space Vectors
123
Theorem 7.3.3 (Entropic Uncertainty Relation).
H(P)
+ H(Q)
~ -21ogc,
where c = maxi,j I(ai I bj)l·
Proof. The expressions H(P) and H(Q) seem somewhat difficult to estimate,
so we should find something to replace them. Let r > 0 and study the expression
(7.18)
We claim that
(7.19)
In fact, since L~=l Pi = 1, the nominator and numerator of the left-hand side
of (7.19) tend to 0 as r does. Therefore, L'Hospital rule applies and
1 (",n
· og
11m
L...i-l
r+l)
Pi
r
r-+O
n
1 r+1 '"'
r+11ogPi
~Pi
L...i=l Pi
i=l
= l'1m ",n
r-+O
n
1
= '"'
~Pi OgPi·
i=l
Notice also that (7.19) implies that
n
!~ (L pr+ 1)
n
1
r =
i=l
IT pfi ,
i=l
and that the claim is equivalent to exp( -H(P) - H(Q» :5 c2 , which can be
written as
n
n
IT pfi IT qli :5 c
2.
i=l
(7.20)
i=l
Our strategy is to replace the products in (7.20) with expressions similar to
those ones in (7.18), estimate them, and obtain (7.20) letting r -+ O.
To estimate (7.18), we use a theorem by M. Riesz [56J:13 If T : Hn -+ Hn
is a unitary matrix, y = Tx, and 1 :5 a :5 2, then
where c = max ITij I.
13
Riesz's theorem is a special case of Stein's interpolation formula. The elementary
proof given by Riesz [56] is based on Holder's inequality.
124
7. Appendix A: Quantum Physics
By choosing Xi = (ai I x), Tij = (b i I aj), we have Yi
T is unitary. We can now write the theorem in form
n
(LI(bilx)la~l
n
a-I
1
)-a (LI(ailxW)-"
i=l
~C2:a.
= (b i I x).
Clearly,
(7.21)
i=l
Since (7.21) can also be raised to some positive power k, we will try to fix
values such that (7.21) would begin to resemble the right-hand side of (7.18).
Therefore, we will search for numbers r, s, and k such that a~l = 2(s + 1),
ka~l = ~, a = 2(r + 1), -~ = ~, and k2-;.a = 2. To satisfy 1 ~ a ~ 2, we
must choose r E [-~, 0]. A simple calculation shows that choosing s = - 2r~1'
k = _2rt2 will suffice. Thus, (7.21) can be rewritten as
n
n
1
1
(Lqt+1)" (LP~+1r ~ c
i=l
2,
i=l
and, by letting r
~
0, we obtain the desired estimation.
o
Remark 7.3.3. The lower bound of Theorem 7.3.3 does not depend on the
system state x, but it may depend on the representation of observables
and
The independence of the representation is obtained if both observables are
non-degenerate. Then, all the vectors ai and bi are determined uniquely up to
a multiplicative factor having absolute values of 1; hence c = max I(b i I aj) I is
also uniquely determined. Notice also that the degeneration of an observable
means that there are less than n values to be observed, so naturally the
entropy decreases.
Example 7.3.3. Consider a system consisting of m qubits. One orthonormal
basis of H2'm. is the familiar computational basis BI = {IXI ... xm) I Xi E
{O, I}}, and another one is B2 = {H Ix) I Ix) E Bd which is obtained by
applying Hadamard-Walsh transform (see Section 3.1.2 or 8.2.3)
H Ix) = -
1
L
...;zm yEIF2'
(-l)"'·Yly)·
on B I . We will study observables Al and A2 which are determined by BI
and B 2 ,
Ai = L
:DEB,
c~) Ix)(xl .
7.4 Quantum States as Operators
125
Now we do not care very much about the labels c~); they can be fixed in any
manner such that both sequences ~) consist of 2m distinct real numbers.
This, in fact, is to regard an observable as a decomposition of H 2", into
2m mutually orthogonal one-dimensional subspaces. For any Ix) E BI and
Ix') E B 2 , we have
I(x I x')1 =
~,
y2 m
as is easily verified. Thus, the entropic uncertainty relation gives
That is, the sum of the uncertainties of observing an m-qubit state in bases
BI and B2 mutually is at least m bits! On the other hand, the uncertainty
of a single observation cannot naturally be more than m bits: an observation
with respect to any basis will give us a sequence x = Xl ... Xm E {O, l}m with
some probabilities of pz such that EZEF2' Pz = 1. The Shannon entropy is
given by
which achieves the maximum at pz =
maximum uncertainty is m bits.
2!..
for each x E {o,l}m. Thus, the
The lower bound of the uncertainty principle (Theorem 7.3.2) depends
on the commutator [A, B] and on the state Ix). If [A, B] = 0, then the
lower is always trivial. Does a similar result hold also for the entropic uncertainty relation? The answer is yes: if A and B commute, then, according
to Lemma 7.2.5, A and B share an orthonormal eigenvector basis. But then
c = max I(b i I aj) I = 1 and the lower bound is trivial.
7.4 Quantum States as Operators
Let us study a system of two qubits in state
~(IO) 10) + 11) 11)).
(7.22)
In the introductory section we learned that this state is entangled; there is no
way to write (7.22) as a product of two single-qubit states. This also means
that the representation of quantum systems as Hilbert state vectors does not
offer us any vocabulary to speak about the state of the first qubit if the
compound system is in state (7.22). In this section, we will generalize the
notion of a quantum state.
126
7. Appendix A: Quantum Physics
7.4.1 Density Matrices
Let {b l , ... , bn } be a fixed basis of Hn. In this section, the coordinate
representation of a vector x = xlb l + ... + xnbn is interpreted as a column vector (Xl' ... ' Xn) T, and we define the dual vector of Ix) as a row
vector (xl = (xi, ... , x~). Notice that, for any x = Xl b1 + ... xnbn and
y = ylb l + ... + Ynbn, (xlly) as a matrix product looks like
Inspired by the word "bracket", we will call the row vectors (xl brUr-vectors
and column vectors Iy) ket-vectors.l4
Remark 7.4.1. A coordinate independent notion of dual vectors can be
achieved by using the theorem of Frechet and F. Riesz. Any vector x E Hn
defines a linear function j", : Hn -t C by
j",(y) = (x I y).
(7.23)
A linear function j : Hn -t C is called a functional. Functionals form a
vector space H~, a so-called dual space of H n , where addition and scalar
multiplication are defined pointwise. The Frechet-Riesz's theorem states that,
for any functional j : Hn -t C, there is a unique vector x E Hn such that
j = j",. According to this point of view, the dual vector of x is the functional
j", that satisfies (7.23).
For any state (vector of unit length) x = xlb l
define the density matrix belonging to x by
Ix) ® (xl
~ (JJ
- (
®(xi,x;, ...
..
'.
.' ...
E H n , we
,x~)
IXll2 XlX:i ... XlX~)
X2 Xl* IX2 12 ... X2Xn*
··
·
+ ... + xnbn
.
XnX *l Xn X2* ... IXn 12
Remark 7.4.2. For density matrices, there is also a coordinate-independent
viewpoint: it is not difficult to verify that Ix) ® (xl is a matrix representing
a projection onto the one-dimensional subspace generated by x, and we have
14
This terminology was first used by P. A. M. Dirac.
7.4 Quantum States as Operators
127
already used notation Ix)(x I to stand for such a projection. To shorten the
notations, we will omit ® and write Ix) ® (xl =1 x) (x I hereafter. Recall also
that we called states Ix) and I(1) equal, if x = ei8x1 for some real number
B, but the density matrices are robust against factor ei8 , Le., I (1)(X1 1=1
x)(x I. On the other hand, for a one-dimensional subspace W, we can pick
a unit-length vector x which generates W. This vector is unique up to the
multiplicative constant e i8 , and the projection onto W is Ix)(x I. Thus, there
is a bijective correspondence between the states and the density matrices
belonging to unit vectors x.
We list three important properties of density matrices 1x)(x I as follows:
• Since IIxll = 1, matrices 1x)(x I have unit trace (recall that the trace is the
sum of the diagonal elements).
• As projections, matrices 1x) (x 1 are self-adjoint.
• Matrices Ix)(xl are positive, since (lx)(xI)2 =Ix)(xl.
Using these properties, we generalize the notion of a state:
Definition 7.4.1. A state of a quantum system is a positive, unit-tmce selfadjoint opemtor in Hn. Such an opemtor is called a density opemtor.
Hereafter, any matrix representing a state is called a density matrix as well.
Spectral representation offers a powerful tool for studying the properties of
the density matrices.
If p is a density matrix having eigenvalues AI. ... , An, then there is an
orthonormal set {Xl. ... , x n } of eigenvectors of p. Thus, p has a spectral
representation
(7.24)
The spectral representation (7.24) in fact tells us that each state (density
matrix) is a linear combination of one-dimensional projections, which are
alternative notations for quantum states which are defined as unit-length
vectors. We will call the states corresponding to one-dimensional projections
vector states or pure states. States that are not pure are referred to as mixed.
More information about density matrices can be easily obtained: using
the eigenvector basis, it is trivial to see that Al + ... + An = 'fr(p) = 1. Since
p is positive, we also see that Ai = (Xi I PXi) ;::: O. Thus, we can say that each
mixed state is a convex combination of pure states.
Thus, a general density matrix looks very much like a probability distribution of orthogonal pure states. If fact, it is true that each probability
distribution (PI, ... ,Pn) of mutually orthogonal vector states Xl. ... , xn always defines a density matrix
but the converse is not so clear. For there is one unfortunate feature in the
spectral decomposition (7.24). Recall that (7.24) is unique if and only if p
7. Appendix A: Quantum Physics
128
has distinct eigenvalues. For instance, if Al = A2 = A, then it is easy to verify
that we can always replace
with
where
x~ =XICOSO-X2sino,
x~ = Xl sino
+ X2 coso,
and 0 is a real number (Exercise 5). Therefore, we cannot generally say,
without additional information, that a density matrix represents a probability
distribution of orthogonal state vectors.
We describe the time evolution of the generalized states briefly at first.
Afterwards, in Section 7.4.3, we will study the issue of describing the time
evolution in more detail. Here we merely use decomposition (7.24) straightforwardly: for a pure state 1 x)(x 1 there is a unitary mapping U(t) such
that
Ix(t»)(x(t) 1=1 U(t)x(O»)(U(t)x(O) 1= U(t) 1x(O»)(x(O) 1U(t)*
(7.25)
(see Exercise 6). By extending this linearly for mixed states p, we have
p(t) = U(t)p(O)U(t)*,
where U(t) is the time-evolution mapping of the system. By using representation U(t) = e- itH , it is quite easy to verify (Exercise 7) that the Schrodinger
equation takes the form
.d
zdtP(t)
= [H,p(t)].
Next, we will demonstrate how to fit the concept of observables into mixed
states. For that purpose, we will regard an observable as a self-adjoint operator A with spectral representation
where each E(()i) =IXi)(Xi 1 is a projection such that the vectors Xl, ... , Xn
form an orthonormal basis of Hn. If the quantum system is in a state X (pure
state 1 x)(x I), we can write X = CiXl + ... +enxn, where Ci = (Xi 1 X). Thus,
the probability of observing a label ()i is
7.4 Quantum States as Operators
129
n
(x I E((}i)X) = (I)Xj I X)Xj I E((}i)X)
j=l
n
=
L (Xj I E((}i)(X I Xj)X)
j=l
n
= L (Xj I E((}i) Ix)(XI Xj)
j=l
(7.26)
Any state T can be expressed as a convex combination
(7.27)
of pure states IXi) (Xi I, and we use the linearity of the trace to join the concept
of observables together with the notion of states as self-adjoint mappings. Let
T be a state of a quantum system and E an observable which is considered
as a projection-valued measure. The probability that a real number (label) in
set X is observed is Tr(E(X)T) = Tr(TE(X». Notice that, since Tr(T) = 1
and E(X) is a projection, we always have :s: Tr(TE(X» :s: 1. This can
be immediately seen by using the linearity of the trace and the spectral
representation (7.27) for T:
°
Tr(TE(X» =
°
Inequalities
n
n
i=l
i=l
L AiTr(1 Xi)(Xi I E(X» = L Ai(Xi I E(X)Xi)'
° :s: Tr(TE(X»
:s:
1 now directly follow from the facts that
:s: (Xi I E(X)Xi) :s: 1, Ai 2: 0, and that L~=l Ai = 1.
Thus, any state T induces a probability distribution on the set of projections in Hilbert space Hn by f..LT(P) = Tr(T P). If n 2: 3, the converse also
holds.
Theorem 7.4.1 (A. Gleason, 1957). If dim H 2: 3, then each probability
measure f. L : P(H) -+ [0,1J satisfying
1. f..L(0) = 0, f..L(I) = 1
2. f..L(LPi ) = Lf..L(Pi )
for each sequence of orthogonal projections Pi is generated via a state T by
f..LT(P) = Tr(T P).
The proof of the above theorem can be found in [49J.
130
7. Appendix A: Quantum Physics
7.4.2 Subsystem States
We will now study the description of compound systems in more detail. Let
Hn and Hm be the state spaces of two distinguishable quantum systems.
As discussed before, the state space of the compound system is the tensor
product Hn 0 Hm. A state of the compound system is a self-adjoint mapping
on Hn 0 Hm. Such a state can be comprised of states of subsystems in the
following way: if TI and T2 are states (also viewed as self-adjoint mappings)
of Hn and H m , then the tensor product TI0T2 defines a self-adjoint mapping
on Hn 0 H m , as is easily seen (TI 0 T2 is defined by
for the basis states and extended to Hn0Hm by using linearity requirements).
One can also be easily verify that the matrix of TI 0 T2 is the tensor product
of the matrices of TI and T 2 . As well, observables of the subsystems make
up an observable of the compound system: if Al and A2 are observables if
the subsystems (now seen as unit-trace positive self-adjoint mappings), then
Al 0 A2 is a positive, unit-trace self-adjoint mapping in Hn 0 Hm.
The identity mapping I : H -t H is also a positive, unit-trace self-adjoint
mapping and, therefore, defines an observable. However, this observable is a
most uninformative one in the following sense: the corresponding projectionvalued measure is defined by
E (X) = {I, if 1 E X,
1
0, otherwise ,
so the probability distribution associated to E1 is trivial: Tr(T Ei (X)) is 1 if
1 E X, 0 otherwise. Notice also that I has only one eigenvalue 1 but all the
vectors of H are eigenvectors of I. This strongly reflects to the corresponding
decomposition of H into orthogonal subspaces: the decomposition has only
one component, H itself. This corresponds to the idea that, by measuring
observable I, we are not observing any nontrivial property. In fact, if TI and
T2 are the states of Hn and Hm , and Al and A2 are some observables, the it
is easy to see that
Tr((TI 0 T2)(A I 01)) = Tr(TIA I ),
Tr((TI 0 T2)(I 0 A2)) = Tr(T2A2)'
That is, observing the compound observable AI0I (resp. I0A 2) corresponds
to observing the first (resp. second) system only. Based on this idea, we will
define the substates of a compound system.
Definition 7.4.2. Let T be a state of a compound system with state space
Hn 0 Hm. The states of the subsystems are unit-trace positive self-adjoint
mappings TI : Hn -t Hn and T2 : Hm -t Hm such that for any observables
Al and A 2 ,
7.4 Quantum States as Operators
'fr(T1At) = 'fr(T(A1 ® J)),
'fr(T2A 2) = 'fr(T(J ® A2))'
131
(7.28)
(7.29)
We say that T1 is obtained by tmcing over Hm , and T2 by tmcing over Hn.
We can, of course, be skeptical about the existence and uniqueness of T1 and
T2 , but, fortunately, they are always uniquely determined.
Theorem 7.4.2. For each state T there are unique states T1 and T2 satisfying {7.28} and {7.29} for all observables A1 and A 2.
Proof. We will only show that T1 exists and is unique, the proof for T2 is
symmetric. We will first search for a mapping T1 that will satisfy
'fr(T1A) = 'fr(T(A ® J))
(7.30)
for any linear mapping A : Hn ---t Hn. For that purpose, we fix orthonormal
bases {Xl. . .. , xn} and {Y1, ... , Ym} of Hn and Hm respectively. Using these
bases, equation (7.30) can be rewritten as
n
n
m
I)Xi 1 T1Axi) = LL(Xi ®Yj 1 T(A®J)(Xi ®Yj))
i=1 j=1
i=1
n
=
m
L L(Xi ® Yj 1 T(Axi ® Yj))·
i=1 j=1
(7.31 )
It is easy to see that all the linear mappings Hn ---t Hn form a vector space
which is generated by mappings 1Xi)(Xj I. In fact, the matrix representation
of mapping 1Xi) (Xj 1 is just the matrix having Os elsewhere but 1 at the intersection of the ith row and the jth column. Therefore, T1 can be represented
as
n
n
T1 = LLCij IXi)(xjl,
i=1 j=1
(7.32)
where Cij are complex numbers.
By choosing A =IXk)(XII and substituting (7.32) into (7.31), we obtain
m
Clk = L(XI ® Yj 1 T(Xk ® Yj)),
j=1
which gives
n
n
m
T1 = L L L(Xi ® Yk 1 T(Xj ® Yk)) IXi)(Xj 1
i=1 j=1 k=1
as a candidate for T 1 • In fact, mapping (7.33) is self-adjoint:
(7.33)
132
7. Appendix A: Quantum Physics
n
T{
n
m
2:2: 2: (T(xj 0Yk)
=
1
Xi0Yk)
IXj)(Xil
i=l j=l k=l
n
n
m
2: L
=
L(Xj 0 Yk 1 T(Xi 0 Yk)) IXj)(Xi I
i=l j=l k=l
= Tl ·
By choosing A = I, we also see that Tr(Tr) = 1. What about the positivity
of Tl? Analogously to the formula (7.26) we have
(x 1TlX)
= Tr(Tl
Ix)(xl)
= Tr(T(lx)(xI0I))
2: 0,
so Tl is also positive. This proves the existence of the required state T l .
Assume then, on the contrary, that there is another state T{ such that
Tr(TlA) = Tr(T{ A) for each self-adjoint A. By choosing A to be a projection
1x)(x I, we have that (x 1 (Tl - T{)x) = 0 for any unit-length x. Mapping
Tl -T{ is also self-adjoint, so there is an orthonormal basis of Hn consisting of
eigenvectors of Tl - T{. If all the eigenvalues of Tl - T{ are 0, then Tl - T{ = 0
and the proof is complete. In the opposite case, there is a nonzero eigenvalue
Al of Tl - T{ and a unit-length eigenvector Xl belonging to AI. Then
o
a contradiction.
Let us write Tl = Tr H= (T) for the state obtained by tracing over Hm
and, similarly, T2 = TrHn (T) for the state which we get by tracing over Hn.
By collecting all facts in the previous proof, we see that
n
m
n
Tl = 2:2:2:(Xj 0Yk 1 T(Xi0Yk))
IXj)(Xil·
(7.34)
IYi)(Yjl·
(7.35)
i=l j=l k=l
Similarly,
m
m
n
T2 = LLL(Xk0Yi 1 S(Xk0Yj))
i=l j=l k=l
It is easy to see that the tracing-over operation is linear: TrHn (as
aTrHn (S) + f3TrHn (T).
+ f3T)
=
Example 7.4.1. Let the notation be as in the previous lemma. If S is a state
of the compound system Hn 0 Hm, then S is, in particular, a linear mapping
Hn 0 Hm -+ Hn 0 Hm. Therefore, S can be uniquely represented as
n
m
n
m
S= LLLLSrstu Ixr 0Yt)(xs 0Yul
r=lt=ls=lu=l
(7.36)
7.4 Quantum States as Operators
133
It is easy to verify that IX r 0Yt)(xs 0Yul=lxr)(xsI0IYt)(Yul, so we can
write
n
S =
n
r=1s=1
n
=
m
m
LL Ixr )(xs I0 LLSrstu IYt)(Yul
t=1 u=1
n
LL Ixr )(xs I0 S rs,
(7.37)
r=1 s=1
where each Srs E L(Hm) (recall that notation L(Hm) stands for the linear
mappings Hm --7 Hm). It is worth noticing that the latest expression for Sis
actually a decomposition of an nm x nm matrix (7.36) into an n x n block
matrix having m x m matrices Srs as entries.
Substituting (7.37) into (7.34) we get, after some calculation, that
n
m
m
n
S2 = LLL(Yi I SrrYj) IYi)(Yjl= LSrr.
r=1i=1j=1
r=1
The above expression can be interpreted as follows: if the density matrix
corresponding to the state S is viewed as an n x n block matrix with m x m
blocks Srs, then the density matrix corresponding to the state of system S2
can be obtained by summing together all the diagonal blocks Srr.
Example 7.4.2. Let us investigate the compound system H2 0 H2 of two
qubits. We choose {100) ,101) , 110), Ill)} as the orthonormal basis of H 20H2.
Consider the (pure) state Y = ~ (100) + 111)) of the compound system.
By using (7.34), we see that the states of the subsystems are
S'
=~
10)(01
+~ 11)(11·
By using the familiar coordinate representation 100) = (1,0,0, O)T, 101) =
(O,l,O,O)T, 110) = (O,O,l,O)T, and 111) = (O,O,O,l)T we have that the
coordinate representation of Y is (~, 0, 0, ~ f and it is easy to see that the
density matrix corresponding to the pure state Iy) (y I is
1001)
°°°°
MIY )(t.i="21 ( 0000
'
100 1
and the density matrices correspoding to states S' are
MSI
="21 (10)
01 '
which agrees with the previous example. On the other hand, if we try to
reconstruct the state of H2 0 H2 from states S', we find out that
134
7. Appendix A: Quantum Physics
MSI
0
MSI
1
1000)
0100
4" ( 0 0 1 0
=
'
0001
which is different from MIY)(YI. This demonstrates that the subsystem states
8' obtained by tracing over are not enough to determine the state of the
compound system.
Even though the state of the compound system perfectly determines the
subsystem states, there is no way to obtain the whole system state from the
partial systems without additional information.
Example 7.4.3. Let the notation be as before. If z E Hn0Hm is a unit-length
vector, then 1 z) (z 1 corresponds to a pure state of system Hn 0 Hm. For each
pair (x, z) E Hn x Hn 0 Hm there exists a unique vector Y E Hm such that
(y' 1 y)
= (x 0
Y'
1
z)
for each y' E Hm. In fact, if Yl and Y2 were two such vectors, then (y'
Yl - Y2) = 0 for each y' E H rn , and Yl = Y2 follows immediatedly. On the
other hand,
1
m
Y
= 2)x 0 Yk 1 Z)Yk
k=l
clearly satisfies the condition required. We define a mapping (., .) : Hn x Hn 0
Hm -+ Hm by
m
(x,z)
=
L(x0Yk 1 Z)Yk.
k=l
Clearly (-,.) has properties that resemble the inner product, like linearity
with respect to the second component and anti-linearity 15 with respect to
the first component.
Now, substituting 8 =1 z) (z 1into (7.34), we see that
n
82 = L
1
(Xk, Z))((Xk, Z)
(7.38)
1 .
k=l
Comparing (7.38) with equation
15
n
n
k=l
k=l
Anti-linearity here, as usual, means that (alX!
a2(x2,Z).
+ a2X2, Z)
7.4 Quantum States as Operators
135
somehow explains the name "tracing over". Notice also that if
Z
n
m
n
m
i=l
j=l
i=l
j=l
= 2:: 2:: CijXi 0Yj = 2:: Xi 02:: CijYj,
then clearly (Xk, z)
n
S2 =
m
= I:;'1 CkjYj and S2 becomes
m
2:: l2: ckjYj)(2: ckjYj I·
k=l j=l
j=l
7.4.3 More on Time Evolution
We already know how to interprete abstract mathematical concepts such as
state (positive, unit-trace self-adjoint operator) and observable (can be seen
as a self-adjoint operator). Since these concepts refer to physical objects, it is
equally important to know how the description of the system changes in time.
In other words, we should know how to describe the dynamical evolution of
the system. We have already outlined in Section 7.3.1 how the time evolution
can be described with the state vector formalism. Here we will discuss the
time evolution of the general formalism in more detail.
In the so-called Schrodinger picture, the state of the system may change
in time but the observables remain, whereas in the Heisenberg picture the
observables change. It turns out that both pictures are mathematically equivalent, and here we will adopt the Schrodinger picture.
We will also adopt, as in Section 7.3.1, the causality principle, which tells
us that the state of the system at some time t can be recovered from a state
at a given time t = O. Mathematically speaking, this means that if the state
of the system Hn at t = 0 is So, there exists a function "\It : L(Hn) -t L(Hn)
such that the state at a given time t is St = "\It(So).
It follows that each "\It should preserve self-adjointness, trace, and positivity. A requirement potentially subject to more critisism is that each "\It is
required to be linear.
It is also quite common to require that each "\It is completely positive. To
define the concept of a completely positive linear mapping L(Hn) -t L(Hn ),
we consider first some basic properties of tensor products.
For some natural number m 2: 1, let Hm be a Hilbert space with orthonormal basis {Y1, ... , Ym}. If {Xl, ... , Xn} is also an orthonormal basis
of Hn, then
{Yi ® Xj 1 i E {1, ... ,m},j E {1, ... ,n}}
is a natural choice for an orthonormal basis of Hm 0 Hn. We will keep this
notation throughout this section.
Each linear mapping WE L(Hm ® Hn) can be uniquely represented as
136
7. Appendix A: Quantum Physics
m
m
w= LL IYr)(Ysl ®Ars ,
(7.39)
r=ls=l
where each Ars E L(Hn) (see Example 7.4.1). Whenever W is represented
as in (7.39), then knowing each Ars uniquely determines Wand vice versa,
since
(Yk ® Yi I W(YI ® Xj)) = (Xi I AkIXj),
and any linear mapping in Akl E L(Hn) is completely determined by the
values (Xi I AkIXj). In fact, this is already quite clear from the fact that
(7.39) actually represents an mn x mn matrix W as an m x m matrix with
n x n matrices as entries.
If now V : L(Hn) -+ L(Hn) is a linear mapping and W is as in (7.39), we
define another mapping 1m ® W : L(Hm ® Hn) -+ L(Hm ® Hn) by
m
(1m ® V)(W) =
m
LL
IYr)(Ys I ® V(Ars).
r=ls=l
Definition 7.4.3. Mapping V: L(Hn) -+ L(Hn) is completely positive if for
any m :::: 1 the extended mapping 1m ® V preserves positivity.
Remark 7.4.3. The intuitive meaning of the notion "completely positive" is
the following: instead of merely regarding T as a state of a quantum system
H n , we could as well introduce an "environment" Hm and regard 1m ® T as
a state of a compound system Hm ® Hn. The requirement that V should be
completely positive means that the time evolution on a compound system
must preserve positivity as well.
7.4.4 Representation Theorems
We will derive two representation theorems of completely positive mappings.
For that purpose, we begin with some auxilliary results.
Lemma 7.4.1. A positive mapping A E L(Hn) is self-adjoint.
Proof. Positivity of A means that (x I Ax) :::: 0 for each x E Hn. Specifically,
(x I Ax) is real, hence (x I Ax) = (Ax I x) for each x E Hn. On the other
hand, also (x I Ax) = (A*x I x) for each x E Hn. Writing B = i(A - A*) we
have that (Bx I x) = 0 for any x E Hn. Since B is self-adjoint, there exists
an orthonormal basis {x 1, ... , x n } consisting of eigenvectors of Band
see the spectral representation (7.8). But then
0= (BXi
I Xi) = '>";(Xi I Xi) =.>..;
for each i E {I, ... ,n}, and therefore B
immediately.
O. Equality A
A* follows
o
7.4 Quantum States as Operators
137
Lemma 7.4.2. A mapping A E L(Hn) is positive if and only if there is an
orthonormal basis {Xl, ... , xn} of Hn and non-negative numbers Al, ... , An
such that
(7.40)
Proof. Assume that A has representation (7.40). Each
as
X
E Hn can be written
and then (x I Ax) = Al lall 2 + ... + An lan l2 ~ 0, hence A is positive.
On the other hand, if A is positive, then A is also self-adjoint by the
previous lemma. Using the spectral representation we see that there is an
orthonormal {X 1, ... , Xn} basis of H n such that
for some real numbers AI, ... , An. Since A is positive, we have that
o
for each i.
The proof of the following theorem, the first structure theorem, is due to
[17].
Theorem 7.4.3. Linear mapping V: L(Hn) -t L(Hn) is completely positive
if and only if there exists n 2 mappings Vi E L(Hn) such that
n2
V(A)
=L
(7.41 )
ViAVi*
i=l
for each A E L(Hn).
Proof. If V has representation (7.41), we should show that whenever
m
W =
m
LL IYr)(YsI0Ars
r=ls=l
is positive, then also
r=ls=l
i=l
is positive. This is done by straightforward computation: Any
can be represented in the form
Z
E Hm 0 Hn
138
7. Appendix A: Quantum Physics
where N ::; n 2 and each Ya E Hrn and Xa E Hn. Writing
N
Zi=LYa0Vi*xa
a=l
it is easy to check that
(Z
1
(Irn 0 V)(W)z) = L (Zi 1 W Zi) ~ 0,
i=l
because W is positive.
For the other direction, let V be a completely positive mapping. Then
particularly In 0 V should be positive. For clarity, we will denote the basis of
one copy of Hn by {Y1, ... ,Yn} and that of the other copy by {Xl, ... ,xn}.
Mapping W E L(Hn 0 H n ), defined by
n
n
W = L L IYr)(Ys 10I x r)(x s 1
r=ls=l
n
n
r=ls=l
n
n
= ILYr0 Xr)(LYs0 Xsl,
r=l
s=l
is clearly positive, since from the last representation it can be directly read
that W is a constant multiple of a one-dimensional projection. Since V is
completely positive, it follows that
n
n
(In 0 V)(W) = L L 1Yr)(Ys 10V(1 xr)(x s I)
r=ls=l
(7.42)
is a positive mapping. By Lemma (7.4.2) there exists an orthonormal basis
{VI, ... ,Vn2} of Hn 0 Hn and positive numbers >'1, ... , An2 such that
r=ls=l
;=1
which can also be written as
(In 0 V)(W) =
L
;=1
1
AVi)(Av; I·
(7.43)
7.4 Quantum States as Operators
139
If
n
V
n
= LLVijYj ®a::i
j=1i=1
is a vector in Hn ® Hn , we associate to va mapping V" E L(Hn) by
n
n
L L Vij Ia::i)(a::j I .
V" =
(7.44)
i=1 j=1
Then, a straightforward computation gives that
n
Iv)(vl=
n
LL IYr)(Ysl ®V" Ia::r)(a::sl V,:.
(7.45)
r=1s=1
Associating to each yJ4Vi in (7.43), mapping Vi as in (7.44), and using (7.45)
we see that
n2
(In ® V)(W) =
n"
n
LLL IYr)(Ysl ®Vi Ia::r)(a::sl Vi*
i=1 r=1 s=1
n
n
= L
n2
IYr)(Ys I ® L
L
r=1s=1
Vi Ia::r)(a::s I Vi",
i=1
which, together with (7.42), gives that for each pair r, s:
V(Ia::r)(a::sl)
= LVi
Ia::r)(a::sl Vi".
i=1
Since mappings Ia::r)(a:: s I span the whole space L(Hn), equation
n2
= L l/iAl/i*
V(A)
i=1
o
holds for each A E L(Hn).
Another quite useful criterion for complete positiviness can be gathered
from the proof of the previous theorem:
Theorem 7.4.4. A linear mapping V : L(Hn)
tive if and only if
n
T = L
n
L
IYr)(Ys I ®V(Ia::r)(a::s I)
r=1s=1
is a positive mapping Hn ® Hn
~
Hn ® Hn.
~
L(Hn) is completely posi-
140
7. Appendix A: Quantum Physics
Proof. If V E L(Hn) is a completely positive mapping, then T is positive by
the very definition, since
n
n
I: I: IYr)(Ysl 0
r=l s=l
IXr)(xsl
is positive. On the other hand, if T is positive, then a representation
=
V(A)
I: V;AV;*
;=1
can be found as in the proof of the previous theorem.
o
Recall that we also required that V should preserve the trace.
Lemma 7.4.3. Let V : L(Hn) -t L(Hn) be a completely positive mapping
represented as V(A)
2
n
= L;=l
ViA V; *.
2
Then V preserves the trace if and only
= I.
if L':::l V;*V;
Proof. Let {Xl, . .. ,xn } be an orthonormal basis of Hn. Since
n
I
=
I: Ixr)(xrl,
r=l
we obtain by direct calculation that
n2
(Xk
I I: V;*V;Xl)
;=1
r=l
i=l
i=l
r=l
;=1 r=l
n2
=
Tr(I:V; IX1)(Xkl V;*) = Tr(V(IX1)(Xkl)).
i=l
n
If Li=l
V; * V;
2
= I, then
7.4 Quantum States as Operators
141
i.e., V preserves the trace of the mappings I Xl)(Xk I. Since V and the trace
are linear and the mappings I Xl)(Xk I generate the whole space L(Hn),
V preserves all traces. On the other hand, if V preserves all traces, then
2
L~=l
\Ii*\Ii = I, since any linear mapping A
I AXl).
values (Xk
E
L(Hn) is determined by the
0
The second structure theorem for completely positive mappings connects
them to unitary mappings and offers an interesting interpretation.
Theorem 7.4.5. A linear mapping V : L(Hn) -+ L(Hn) is completely
positive and trace preserving if and only if there exists a unitary mapping
U E L(Hn ® Hn2) and a pure state B =Ib)(bl of system L(Hn2) such that
V(A) = TtHn2 (U(A ® B)U*)
for each A E L(Hn).
Remark
7.4.4.
A unitary time evolution
I x)(x It-tl Ux)(Ux I
can be rewritten as
Ix)(xlt-t U Ix)(xi U*,
so the above theorem states that any completely positive L(Hn) -+ L(Hn)
mapping can be interpreted as a unitary time evolution in a larger system.
Proof. Assume first that V(A) = TtHn2 (U(A®B)U*). Let {Xl, ... , xn} and
{YI, ... ,Yn2} be orthonormal bases of Hn and Hn2, respectively. We write
U* in the form
n
U*
=
n
L L
I Xr)(Xs I®U;s
r=ls=l
and define Vk E L(Hn) as
n
n
Vk = LL(UjiYk I b) IXi)(xjl·
i=l j=l
If now
n
n
A= LLaij IXi)(xjl,
i=l j=l
a direct calculation gives that
142
7. Appendix A: Quantum Physics
n
n
n
n
VkAVk * = L L L L(xr I AXt)(U;iYk
i=l j=l r=l t=l
II b)(bl Ut"'jYk)
IXi)(Xj
I·
(7.46)
On the other hand, (7.34) gives that
TrHn2 (U(A ® B)U*
i=l j=l k=l
k=li=lj=lr=lt=l
= L L L L L ( xr I AXt)(U;iYk
k=l i=l j=l r=l t=l
Ilb)(bl Ut"'jYk) IXi)(xjl,
which, using (7.46), can be written as
TrHn2(U(A®B)U*) = LVkAVk*.
k=l
Thus, by Theorem (7.4.3), V is completely positive. To see that V is trace
preserving, notice that a unitary mapping U converts an orthonormal basis
B to another orthonormal basis B', so
Tr(UTU*)
=
L
(z' I UTU* z')
=
z;'EB
=
L
(U* z' I TU* z')
L
z;'EB'
(U-IZ' I TU-1z')
z;'EB'
(z I Tz)
=L
Since "tracing over" TrHn2 : L(Hn ® Hn2) --+ L(Hn ), T
defined by condition
for each projection P E L(Hn ), we can use P
Tr(TrHn2 (U(A ® B)U*))
= Tr(U(A ® B)U*) = Tr(A ® B)
= Tr(T).
z;EB
H
Tl = TrHn2 (T) is
= In to obtain
= Tr(A)Tr(B) = Tr(A).
On the other hand, let V : L(Hn) --+ L(Hn) be a completely positive and
trace-preserving linear mapping. By Theorem 7.4.3, V can be represented as
n2
V(A)
= LViAVi*,
i=l
where each Vi E L(Hn) and
7.4 Quantum States as Operators
143
2: ViVi* = I.
i=l
Fix an arbitrary vector b E Hn2 of unit length. For any x E Hn we define
n2
U(x0b)
= 2:Vix0Yi.
(7.47)
i=l
Clearly U is linear with respect to x. If
gives
i=l
j=l
i=l
j=l
Xl, X2 E
Hn, a direct computation
i=l
i=l
Denoting
Hn 0 b = {x 0 b 1 x E Hn}
~
Hn 0 Hn2
we see that U : Hn 0 b ---* Hn 0 Hn2 is a linear mapping which preserves
inner products. It follows that U can be extended (usually in many ways) to
a unitary mapping U' E L(Hn 0Hn2). We fix one extension and denote that
again by U. If x is any unit-length vector, then
U (I x) (x 10 1b) (b I) U*
= U 1x 0
b) (x 0 b 1U*
= 1U(x 0
b))(U(x 0 b) 1=12: Vix 0 Yi)(2: Vix 0 Yi
i=l
i=l
Using Example 7.4.3 it is easy to see that
I·
i=l
= 2: Vi Ix)(xl Vi* = V(lx)(xl)·
i=l
Hence with U defined as in (7.47) and B =1 b)(b I, the claim holds at least
for all pure states 1x) (x I. Since linear mappings
A
H
TrHn2
(U(A 0 B)U*)
and A H V(A) agree on all pure states and all states can be expressed as
convex combinations of pure states, the mappings must be equal.
0
144
7. Appendix A: Quantum Physics
7.5 Exercises
1. Show that the trace function
n
Tr(A)
= L (Xi I AXi)
i=l
is independent of the chosen orthonormal basis {X 1, ... , x n }.
2. Show that if P is a self-adjoint operator Hn -+ Hn and p 2 = P, then
there is a subspace W E Hn such that P = Pw .
3. If >'1, ... , An are eigenvalues of a self-adjoint operator A and Xl, ... , Xn
the corresponding orthonormal set of eigenvectors, verify that
4. Verify that the Polarization equation
holds. Conclude that if IIAxl1
(x I y) for each x, y E Hn.
5. Prove that
=X
for any
X
E H n , then also (Ax lAy)
=
where
=
Xl
cos a - x2sma,
x; =
Xl
sina + X2 cosa,
I
Xl
.
and a is any real number.
6. Prove that I Ax)(By 1= A I x)(y I B* for each x, y
operators A, B : Hn -+ Hn.
7. Derive the generalized Schrodinger equation
.d
t
dtP(t) = [H, p(t)]
with equation
p(t) = U(t)p(O)U(t)*
and representation U(t)
= e- itH
as starting points.
E
Hn and any
8. Appendix B: Mathematical Background
The purpose of this chapter is to introduce the reader to the basic mathematical notions used in this book.
8.1 Group Theory
8.1.1 Preliminaries
A group G is a set equipped with mapping G x G, G, i.e., with a rule that
unambiguously describes how to create one element of G out of an ordered
pair of given ones. This operation is frequently called the multiplication or the
addition and is denoted by 9 = g1g2 or 9 = gl + g2 respectively. Moreover,
there is a special required element in G, which is called the unit element
or the neutral element and it is usually denoted by 1 in the multiplicative
notation, and 0 in the additive notation.
Furthermore, there is one more required operation, inversion, that sends
any element of 9 into its inverse element g-1 (resp. opposite element -g
when additive notations are used). Finally, the group operations are required
to obey the following group axioms (using multiplicative notations):
1. For all elements gl, g2, and g3, gl(g2g3)
2. For any element g, gl = 19 = g.
3. For any element g, gg-1 = g-lg = 1.
=
(glg2)g3.
If a group G also satisfies
4. For all elements gl and g2, g1g2
= g2g1,
then G is called an abelian group or a commutative group.
To be precise, instead of speaking about group G, we should say that
(G,·, -1,1) is a group. Here G is a set of group elements; . stands for the
multiplication; -1 stands for the inversion, and 1 is the unit element. However,
ifthere is no danger of confusion, we will just use the notation G for the group,
as well as for the underlying set.
Example B.l.l. Integers Z form an abelian group having addition as the
group operation, 0 as the neutral element and mapping n H -n as the
inversion. On the other hand, natural numbers
146
8. Appendix B: Mathematical Background
N={1,2,3, ... }
do not form a group with respect to these operations, since N is not closed
under the inversion n H -n, and there is no neutral element in N.
Example B.l.2. Nonzero complex numbers form an abelian group with respect to multiplication. The neutral element is 1, and the inversion is the
mapping Z H z-l.
Example B.l.3. The set of n x n matrices over C that have a nonzero determinant constitute a group with respect to matrix multiplication. This group,
denoted by GLn(C) and called the general linear group over C, is not abelian
unless n = 1, when the group is essentially the same as in the previous
example.
Regarding group axiom 1, it makes sense to omit the parenthesis and write
just glg2g3 = gl (g2g3) = (g192)g3. This clearly generalizes to the products
of more than three elements. A special case is a product g ... 9 (k times),
which we will denote by gk (in the additive notations it would be kg). We
also define gO = 1 and g-k = (g-l)k (Og = 0, -kg = k( -g) when written
additively).
8.1.2 Subgroups, Cosets
A subgroup H of a group G is a group contained in G in the following way:
H is contained in G as a set, and the group operations (product, inversion,
and the unit element 1 ) of H are only those of G which are restricted to H.
If H is a subgroup of G, we write H S; G.
Let us now suppose that a group G (now written multiplicatively) has H
as a subgroup. Then, for each 9 E G, the set
gH
=
(8.1)
{gh I h E H}
is called a coset of H (determined by g).
Simple but useful observations can easily be made; the first is that each
element 9 E G belongs to some coset, for instance in gH. This is true because
H, as a subgroup, contains the neutral element 1 and, therefore, also the
contains the element 9 = gl E gH. This observation tells us that the cosets of
a subgroup H cover the whole group G. Notice that, especially for any hE H,
we have hH = H because H, being a group, is closed under multiplication
and each element h1 E H appears in hH (h1 = h(h- 1hd)·
Other observations are given in the following lemmata.
Lemma 8.1.1. glH
1
= g2H
if and only if g1"l g2 E H.
The unit element can be seen as a nullary operation.
8.1 Group Theory
147
Proof. If g1H = g2H, then by necessity g2 = g1h for some h E H. Hence,
g"1 1g2 = h E H. On the other hand, if g"1 1g2 = h E H, then g2 = g1h and so
g2H = g1hH = g1H.
0
Remark 8.1.1. Notice that g"1 1g2 E H if and only if g2 1g1 E H. This is true
because H contains all inverses of the elements in Hand {g"1 1g2)-1 = g2 1g 1.
In the additive notations, the condition g"1 1g2 E H would be written as
-g1
+ g2 E H.
Lemma 8.1.2. Let H be a finite subgroup of G. Then all the cosets of H
have m = IHI elements.
Proof. By the very definition of (8.1), it is clear that each coset can have
at most m elements. If some coset gH has less than m elements, then for
some hi i= h j we must have gh i = gh j . Multiplication by g-1, however, gives
hi = hj , which is a contradiction.
0
Definition 8.1.1. If g1H = g2H, we say that g1 is congruent to g2 modulo
H.
Lemma 8.1.3. If g1H
i= g2H,
then also g1H n g2H =
0.
Proof. Assume, on the contrary, that g1H i= g2H have a common element
g. Since 9 E g1H n g2H, we must have 9 = g1h1 = g2h2 for some elements
h 1, h2 E H. But then g1 = g2h2hi1, and g1H = g2h2hi1 H = g2h2H = g2H,
a contradiction.
0
If G is finite and H :::; G, we call the number of the cosets of H the index
of H in G and denote this number by [G: HI. Since the cosets cover G, we
know by Lemma 8.1.3 that the group G is a disjoint union of [G : HI cosets
of H, which all have the same cardinality IHI by Lemma 8.1.2. Thus, we have
obtained the following theorem
Theorem 8.1.1 (Lagrange). IGI = [G: HI·IHI.
Even though Lagrange's theorem was easy to derive, it has very deep implication: the group structure, which does not look very demanding at first glance,
is complicated enough to heavily restrict the cardinalities of the subgroups;
the subgroup cardinality IHI must always divide IGI. It follows that a group
G with a prime number cardinality can only have the trivial subgroups, the
one consisting of only the unit element and the group G itself.
8.1.3 Factor Groups
A subgroup H :::; G is called normal if, for any 9 E G and h E H, always
ghg- 1 E H. Notice that all the subgroups of an abelian group are normal.
For a normal subgroup H :::; G we define the product of cosets g1H and g2H
by
148
8. Appendix B: Mathematical Background
(8.2)
But, as far as we know, two distinct elements can define the same coset:
g1H = g2H may hold even if gl =f. g2. Can the coset product ever not be
well-defined, Le., that the product would depend on the representatives g1
and g2 which are chosen? The answer is no:
Lemma 8.1.4. If H is a normal subgroup, g1H
then also (glg2)H = (g~g~)H.
=
g~ Hand
g2H
=
g~H,
Proof. By assumption and Lemma 8.1.1, gllg~ = hI E H and g:;1g~ = h2 E
H. But since H is normal, we have
"
( g1g2 ) -1(,
g1g2') = g2-1 gl-1 g1g2
= g2-1h'
192
= g:;1hlg29:;lg~ = g:;lh 1g2h 2 E H.
The conclusion that g:;lh 1g 2h 2 E H is due to the fact that g:;lh 1g2 is in H
because H is normal. Therefore, by Lemma 8.1.1, (glg2)H = (g~g~)H.
0
The coset product offers us a method of defining the factor group G / H:
Definition 8.1.2. Let G be a group and H :S G a normal subgroup. The
factor group G / H has the cosets of H as the group elements and the coset
product as the group operation. The neutral element of G / H is lH = Hand
the inverse of a coset gH is g-1 H. The factor group is also called the quotient
group.
Using the above terminology,
IG/HI
=
[G: H].
Example 8.1.4. Consider Z, the additive group of integers and a fixed integer
n. Then, all the integers divisible by n form a subgroup
n?l
= {... -
3n, -2n, -n, 0, n, 2n, 3n, ... },
as is easily verified. Any coset of the subgroup nZ looks like
k
+ nZ = {... k -
3n, k - 2n, k - n, k, k
+ n, k + 2n, k + 3n ... }
and two integers k 1, k2 are congruent modulo n?l if and only if kl + nZ =
k2 + nZ which, by Lemma 8.1.1, holds if and only if k2 - k1 E n?l, i.e, k2 - k1
is divisible by n. The factor group Z/(nZ) is usually denoted by ?In.
Definition 8.1.3. If integers kl and k2 are congruent modulo nZ, we also
say that kl and k2 are congruent modulo n and denote k1 == k2 (mod n).
Let us pay attention to the subgroups of a finite group G of special type.
Pick any element 9 E G and consider the set
}
{ 9 0 ,g 1 ,g 2 , ....
(8.3)
The set (8.3) is contained in G and must, therefore, be finite. It follows that,
for some i > j, equality gi = gj holds, and multiplication by (gj)-1 gives
gi- j = 1. This supplies the motivation to the following definition.
8.1 Group Theory
149
Definition 8.1.4. Let G be a finite group and let g E G be any element. The
smallest nonnegative integer k such that gk = 1 is called the order of 9 and
denoted by k = ord(g).
It follows that, if k = ord(g), then gl+km = glgkm = gllm = gl for
any integer m. Moreover, it is easy to see that the set (8.3) is an abelian
subgroup of G. In fact, the set (8.3) is clearly closed under multiplication
and the inverse of an element gl can be expressed as g-l+km, where m is any
integer. We say that set (8.3) is a cyclic subgroup of G generated by 9 and
denote that by
k-I}
(g=g,g,g,
)
{ 0 1 2...
,g
.
According to Lagrange's theorem, k
Thus, we have obtained
(8.4)
=
ord(g)
=
l(g)1 always divides
IGI.
Corollary 8.1.1. For each g E G, we have glGI = 1.
Proof. Since IGI is divisible by k = ord(g), we can write
integer l, and then glGI = gk.1 = 11 = 1.
8.1.4 Group
IGI =
k . l for some
0
Z~
Consider again the group Zn. Although the group operation is the coset
addition, it is also easy to see that the coset product
is a well-defined concept. In fact, suppose that kl + nZ = k~ + nZ and
k2 + nZ = k2 + nil. Then kl - k~ and k2 - k2 are divisible by n, and also
klk2-k~ k2 = (kl -kDk2+k~ (k2 -k 2) is divisible by n. Therefore, k 1 k 2+nZ =
k~k2 + nZ.
Coset 1 + nZ is clearly a neutral element with respect to multiplication,
but the cosets k + nZ do not form a multiplicative group, the reason being
that the cosets k + nZ such that gcd(k, n) > 1 do not have any inverse. To
see this, assume that k + nZ has an inverse k' + nZ, i.e., kk' + nZ = 1 + nZ.
This means then that kk' - 1 is divisible by n and, hence. also divisible by
gcd(n, k). But then 1 would also be divisible by gcd(n, k), which is absurd
because gcd( n, k) > 1.
To remove the problem of non-invertible elements, we define Z~ to be
the set of all the cosets k + nZ that have a multiplicative inverse; thus, Z~
becomes a multiplicative group. 2 Previously, we saw that all cosets k + nil
with gcd(k, n) > 1 do not belong to il~ and next we will demonstrate that
2
To be more algebraic, the group Z;. is the unit group of ring Zn. The fact that nZ
is an ideal of ring Z automatically implies that the coset product is a well-defined
concept.
150
8. Appendix B: Mathematical Background
consists exactly of cosets k + nZ such that gcd(k, n) = 1 (Notice that this
property is independent of the representative k chosen). For that purpose we
have to find an inverse for each such coset, which will be an easy task after
the following lemma.
Z~
Lemma 8.1.5 (Bezout's identity). For any natural numbers x, y there
are integers a and b such that ax + by = gcd(x, y).
Proof. By induction on M = max{x, y}. If M = 1, then necessarily x = y = 1
and 2 ·1-1·1 = 1 = gcd(l, 1). For the general case we can assume, without
loss of generality, that M = x> y. Because gcd(x - y, y) is a divisor of x - y
and y, it also divides x, so gcd(x - y, y) :5 gcd(x, y). Similarly, gcd(x, y) :5
gcd(x - y, y) and, hence, gcd(x - y, y) = gcd(x, y). We apply the induction
hypothesis to the pair (x - y, y) to get numbers a' and b' such that
a'(x - y)
+ b'y = gcd(x -
y, y)
= gcd(x, y),
which gives the required numbers a
= a', b = b' -
a'.
o
If gcd(k, n) = 1, we can use the previous lemma to find the inverse of the
coset k + nZ: let a and b be the integers such that ak + bn = 1. We claim
that the coset a + nZ is the required multiplicative inverse. But this is easily
verified: since ak -1 is divisible by n, we have ak + nZ = 1 + nZ and therefore
(a + nZ)(k + nZ) = 1 + nZ.
How many elements are there in group Z~? Since all the cosets of nZ
can be represented as k + nZ, where k ranges over the set {D, ... , n - I},
we have to find out how many of those k values satisfy the extra condition
gcd(k, n) = 1. The number of such values of k is denoted by cp(n) = IZ~I and
is called Euler's cp-function.
Let us first consider the case that n = pm is a prime power. Then, only
the number.s Dp, Ip, 2p, ... , (pm-l - l)p in the set {D, 1, ... ,pm - I} do
not satisfy the condition gcd( k, pm) = 1. Therefore, cp(pm) = pm _ pm-I.
Especially cp(p) = p - 1 for prime numbers.
Assume now that n = nl ... n r , where numbers ni are pairwise coprime,
i.e., gcd(ni,nj) = 1 whenever if:. j. We will demonstrate that then cp(n) =
cp(nl)'" cp(nr) and, for that purpose, we present a well-known result:
Theorem 8.1.2 (Chinese Remainder Theorem). Let n = nl ... nr with
gcd(ni' nj) = 1 when i f:. j. Then, for given cosets k i + niZ, i E {I, ... , r}
there exists a unique coset k + nZ of nZ such that for each i E {I, ... , r}
k
+ niZ = ki + niZ.
Proof. Let mi = n/ni = nl ... ni-lni+1 ... n r . Then, clearly gcd(mi' ni) = 1,
and, according to Lemma 8.1.5, aimi + bini = 1 for some integers ai and bi .
Let
8.1 Group Theory
151
Then
is divisible by ni, since each mj with j :f. i is divisible as well, and aimi -1 is
also divisible by ni because aimi +bini = 1. It follows that k+niZ = k i +niZ,
which proves the existence of such a coset k + nZ. If also k' + niZ = k i + niZ
for each i, then k' - k is divisible by each ni and, since the numbers ni are
coprime, k' - k is also divisible by n = nl ... n r . Hence, k' + nZ = k + nZ.
o
Clearly, any coset k + nZ E Z~ defines r cosets k + niZ E Z~i. But in
accordance with Chinese Remainder Theorem, all the cosets k i + niZ E Z~i
are obtained in this way. To show that, we must demonstrate that, if we
have gcd(ki' ni) = 1 for each i, then the k which is given by the Chinese
Remainder Theorem also satisfies gcd(k, n) = 1. But this is straightforward:
if gcd(k, n) > 1, then also d = gcd(k', ni) > 1 for some i and some k' such
that k = k'k". Then, however, k'k" + niZ = k + niZ (j. Z~. It follows that
and therefore,
cp(nl)··· cp(nr)
= IZ~ll·· ·IZ~J
= IZ~l x ... x Z~J = IZ~I = cp(n).
Now we are ready to count the cardinality of Z~: let n
prime factorization of n, Pi :f. Pj whenever i :f. j. Then
cp(n)
=
cp(p~l) ... cp(p~r)
=
p~l
... p~r be the
(p~l _ p~l-I) ... (p~r _ p~r-I)
= p~l (1 - ~) ... p~r(1
PI
=
-
~) = n(l - ~) ... (1 - ~).
PI
Pr
Pr
By expressing the Corollary 8.1.1 on Z~ using the notations of Definition
(8.1.3), we have the following corollary.
Corollary 8.1.2.
If gcd(a, n) = 1, then a'l'(n)
This result is known as Euler's theorem. If n
and Euler's theorem can be formulated as
aP-
1
=: 1
=: 1 (mod n).
= P is a prime, then cp(p) = p-l
(mod p).
Congruence (8.5) is known as Fermat's little theorem.
8.1.5 Group Morphisms
Let G and H be two groups written multiplicatively.
(8.5)
152
8. Appendix B: Mathematical Background
Definition 8.1.5. A mapping f : G ---+ H such that f(glg2) = f(gI)f(g2)
for all gl, g2 EGis called a group morphism from G to H.
It follows that f(l) = f(1 . 1) = f(l)f(l) and, hence, f(l) = 1. Moreover,
1 = f(l) = f(gg-l) = f(g)f(g-I), which implies that f(g-l) = f(g)-I. By
induction it also follows that f(gk) = f(g)k for any integer k.
Definition 8.1.6. Let f : G ---+ H be a group morphism.
1. The set Ker(f) = f-l(l) = {g E G I f(g) = I} is called the kernel of f·
2. The set Im(f) = f(G) = {f(g) I 9 E G} is called the image or the range of
f·
If f(k 1) = f(k 2) = 1, then also f(k 1k 2) = 1 and f(k 1 1) = 1, i.e., Ker(f)
is closed under multiplication and inversion, which means that Ker(f) is a
subgroup of G. In fact, if k E Ker(f)' then also
f(gkg- 1) = f(g)f(k)f(g)-1
= f(g)f(g)-1 = 1,
so Ker(f) is even a normal subgroup. Similarly, it can be seen that Im(f) is
a subgroup of H, not necessarily normal.
A morphism f : G ---+ H is injective if gl =I g2 implies f(gl) =I f(g2),
surjective if f(G) = H and bijective if it is both injective and surjective.
Injective, surjective and bijective morphisms are called monomorphisms, epimorphisms, and isomorphisms respectively.
Two groups G and G' are called isomorphic, denoted by G :::: G', if there
is an isomorphism f : G ---+ G'. Notice that two isomorphic groups differ only
in notations (any element 9 is replaced with f(g)).
An interesting property is given in the following lemma:
Lemma 8.1.6. Let f : G ---+ H be a group morphism. Then the factor group
G /Ker(f) is isomorphic to Im( H).
Proof. First notice that Im(H) is always a subgroup of H and that Ker(f)
is a normal subgroup of G, so the factor group G /Ker(f) can be defined. We
will denote K = Ker(f) for short and define a function F : G/ K ---+ Im( H)
by F(gK) = f(g). The first thing to be verified is that F is well-defined, i.e,
the value of F does not depend on the choice of the coset representative g.
But this is straightforward: If glK = g2K, then by Lemma 8.1.1, g1 1g2 E
K and, by the definition of K, f(g1 1g2) = 1, which implies that f(gI) =
f(g2) and F(gIK) = f(gl) = f(g2) = F(g2K). It is clear that F is a group
morphism. The injectivity can be seen as follows: if F(gIK) = F(g2K), then
by definition, f(gl) = f(g2), hence j(g1 1g2) = 1, which means that g1 1g2 E
K. But this is to say, by Lemma 8.1.1, that glK = g2K. It is clear that F is
surjective, hence F is an isomorphism.
D
8.1 Group Theory
153
8.1.6 Direct Product
Let G and G' be two groups. We can give the Cartesian product G x G' a
group structure by defining
It is easy to see that G x G' becomes a group with (1, I') as the neutral
element 3 and (g, g') f-t (g-l, g,-I) as the inversion. The group G x G' is
called the (outer) direct product of G and G'. The direct product is essentially
commutative, since G x G' and G' x G are isomorphic; an isomorphism is
given by (g,g') f-t (g', g). Also, (G 1 XG 2)XG3 ~ G 1 X(G 2 XG 3) (isomorphism
given by ((gl,g2),g3) f-t (gb(g2,g3)), so we may as well write this product
as G 1 x G 2 X G3. This generalizes naturally to the direct products of any
number of groups.
If a group G has subgroups HI and H2 such that G is isomorphic to
HI x H 2, we say that G is the (inner) direct product of subgroups HI and H 2.
We then write also G = HI X H 2. Notice also that in this case, the subgroup
HI (and also H 2) is normal. To see this, we identify, with some abuse of
notations, G and HI x H2 and HI = {(h, 1) I hE HI}. Then
(hI, h2)(h, l)(hb h 2)-1 = (hlhhll, 1) E HI,
as required.
The following lemma will make the notion of the factor group more natural:
Lemma 8.1.7. JfG
= HI x H2 (inner) then G/H1 ~ H 2.
Proof. Since G = HI X H 2, there is an isomorphism f : G --+ HI X H2
which gives each 9 EGa representation f(g) = (hI, h 2), where hI E HI
and h2 E H 2. A mapping p : HI x H2 --+ H2 defined by p(hI, h 2) = h2 is
evidently a surjective morphism, called a projection onto H 2. Because f is
an isomorphism, the concatenated mapping pf : G --+ H2 is also a surjective
morphism. Now p(f(g)) = 1 if and only if f(g) = (hI, 1), which happens if
and only if 9 E HI. This means that Ker(pJ) = HI, so by Lemma 8.1.6 we
have G/H1 ~ H 2.
0
Example B.l.5. Consider again the group Z~, where n =
decomposition of n. It can be shown that the mapping
p~l
... p~r is a prime
given by the Chinese Remainder Theorem is an isomorphism.
3
Here 1 and I' are the neutral elements of G and G' respectively.
154
8. Appendix B: Mathematical Background
8.2 Fourier Transforms
8.2.1 Characters of Abelian Groups
Let G be a finite abelian group written additively. A character of G is a
morphism X : G -+ C \ {O}, i.e., each character satisfies the condition X(gl +
g2) = X(gl)x(g2) for any elements gl and g2 E G. It follows that X(O) = l.
Denoting n = IGI, we also have by Corollary (8.1.1) that X(g)n = x(ng) =
X(O) = 1, so any character value is a nth root of unity.4
An interesting property of characters is that for two characters Xl, X2,
we can define the product character X1X2 : G -+ C by X1X2(g) = X1(g)X2(g).
Moreover, it is easy to see that, with respect to this product, the characters
also form an abelian group called the character group or the dual group of
G and denoted by C. The neutral element XO of the dual group is called the
principal character or the trivial character and is defined by XO (g) = 1 for
each element 9 E G. The inverse of a character X is character X- 1 defined by
X- 1(g) = X(g)-l.
Example 8.2.1. Let us determine the characters of a cyclic group
G = {g, 2g, ... , (n - l)g, ng = O}.
It is easy to see that any cyclic group of order n is isomorphic to Zn, the
additive group of integers modulo n, so we can consider Zn as a "prototype"
when studying cyclic groups. For any fixed y E Z we define mapping Xy :
Z -+ C by
Since e 27ri = 1, Xy has period n, so we can, in fact, consider Xy as a mapping
Zn -+ C. Moreover, since Xy(x) = Xx(Y), we can also assume that y E Zn
instead of y E Z. Now
Xy(x
+ z) = e
21t'iy(x+z)
n
271'ixy
21'!'uy
= e-n-e-n- =
Xy(x)Xy(z),
which means that each Xy is, in fact, a character of Zn. Moreover, if y and
z are some representatives of distinct cosets modulo n, then also Xy and
Xz are different. Namely, if Xy = Xz, then especially Xy(l) = Xz(l), i.e.,
e 2:: y = e h;z , which implies that y = z + k . n for some integer k. But then
y and z represent the same coset, a contradiction.
It is straightforward to see that XaXb = Xa+b, so the characters of Zn also
form a cyclic group of order n, generated by Xl, for instance. This can be
summarized as follows: the character group of Zn is isomorphic to Zn; hence,
the character group of any cyclic group is isomorphic to the group itself.
4
A nth root of unity is a complex number x satisfying xn
=
l.
8.2 Fourier Transforms
155
The fact that the dual group of a cyclic group is isomorphic to the group
itself, can be generalized: a well-known theorem states that any finite abelian
group G can be expressed as a direct sum5 (or as a direct product, if the
group operation is thought as multiplication) of cyclic groups G i :
(8.6)
Lemma 8.2.1. Let G be an abelian 9rouP as above. Then G :::
G.
Proof By Example 8.2.1 we know that for each i, Gi ::: Gi , so it suffices to
demonstrate that G = G1 X ... x Gm. For that purpose, let Xl. ... ,Xm be
some characters of G 1 , ... , Gm . Since G = G 1 EB .. . EBGm , each element 9 E G
can be uniquely expressed as
9 =91
+ ... +9m,
where 9i E G i . This allows us to define a function X : G --t C \ {O} by
(8.7)
It is now easy to see that X is a character of G. Moreover, if X~ =I- Xi, then
there is an element 9i E G i such that XH9i) =I- Xi(9i) and, hence,
X'(9i)
= X1(0) ... X~(9i) ... Xm(O)
=I- Xl (0) ... Xi(9i) ... Xm(O) = X(9i),
which means that characters of G defined by Equation (8.7) are all different
for different choices of Xl. ... , Xm·
On the other hand, all the characters of G can be expressed as in (8.7).
For, if X is a character of G, we can define Xi by restricting X to G i . It is
easy to see that each Xi is a character of G i and that X = Xl ... Xm.
0
As an application, we will find some important characters.
Example 8.2.2. Consider 1FT, the m-dimensional vector space over the binary
field. Each element in the additive group of 1FT has order 2, so group 1FT has
a simple decomposition:
IF2 = IF2 EB ... EB IF2.
'm-components
----'
Now it suffices to determine the characters of IF2, since by Lemma 8.2.1,
characters of 1FT are just the m-fold products of the characters of IF 2 . But
the characters of IF2 = Z2 were already found in Example 8.2.1: They are
Xy(x) = e
5
21fi:z:y
2
= (_l)XY
Group G is a direct sum of subgroups G I , ... , G m if each element of G has
unique representation g = gl + ... + gm, where gi E G i .
156
8. Appendix B: Mathematical Background
for Y E {O, 1}. Therefore, any character of 1F2' can be written as
where x . Y =
x
XlYl
+ ... + XmYm
= (Xl, ... ,xm ) and Y = (Yl,'"
is the standard inner product of vectors
,Ym).
8.2.2 Orthogonality of the Characters
Let G = {gl,'" ,gn} be a finite abelian group. The functions f : G --+ C
clearly form a vector space V over C, addition and scalar multiplication
defined pointwise: (f + h)(g) = f(g) + h(g) and (c· f)(g) = c· f(g) (see
also Section 8.3). We also have an alternative way of thinking about V: each
function f : G --+ C can be considered as an n-tuple
(8.8)
so the vector space V evidently has dimension n. The natural basis of V is
given by el = (1,0, ... ,0), e2 = (0,1, ... ,0), ... , en = (0,0, ... ,1). In other
words, ei is a function G --+ C defined as
1, if i = j
ei(gj) = { 0, otherwise.
The standard inner prod uct in space V is defined by
n
(f I h) = L !*(gi)h(gi),
(8.9)
i=l
and the inner product also induces norm in the very natural way:
Ilhll = J(hlh).
Basis {el,'" en} is clearly orthogonal with respect to the standard inner
product. Another orthogonal basis is the characters basis:
Lemma 8.2.2. If Xi and Xj are characters of G, then
0, if i f:. j
(xiIXj)= { n,ifi=j.
(8.10)
Proof. First, it is worth noticing that
1 = Ix(g)1 2 = X*(g)X(g),
which implies that x*(g) = X(g)-l for any 9 E G. Then
(Xi I Xj)
n
=L
k=l
n
X;(gk)Xj(gk)
=L
k=l
n
X;1(9k)Xj(9k)
= L(X;lXj)(9k).
k=l
8.2 Fourier Transforms
157
If i = j, then XilXj is the principal character and the claim for i = j follows
immediately. On the other hand, if i =f=. j, then X = XilXj is a nontrivial
character of G and it suffices to show that
n
S
= LX(9k) = 0
k=l
for a nontrivial character X. Because of nontriviality, there exists an element
9 E G such that x(g) =f=. 1. Furthermore, mapping 9 H 9 + gi is a permutation
of G for any fixed g, so
n
n
n
k=l
k=l
k=l
The claim S = 0 follows now from the equation (1 - X(g))S
o
= O.
Since the characters are orthogonal and there are n = IGI of them, they
also form a basis. By scaling the characters to the unit norm, we obtain an
orthonormal basis B = {B l , ... , B n }, where
Other interesting features of the characters can be easily deduced: let us
define a matrix X E en x n by
By denoting the transposed complex conjugate of X by X*, since Xij
Xi(gi), we get
n
(X* X)ij
=
L X;kXkj
k=l
n
= L X:(gk)Xj(gk)
= (Xi
I Xj)
k=l
=
{O,n,lfz=],
~f ~ =f=. j
which actually states that X* X = nI, which implies that X- l
since any matrix commutes with its inverse, we also have XX*
can be written as
or as
= ~X*.
= nI,
But
which
158
8. Appendix B: Mathematical Background
~
() * ()
~ Xk gi Xk gj =
k=l
{O,n ifif ii =f.= Jj..
(8.11)
'
Equations (8.10) and (8.11) are known as the orthogonality relations of characters. Notice also that, by choosing Xj as the principal character in (8.10)
and gj as the neutral element in (8.11), we get a useful corollary:
Corollary 8.2.1.
{n,
{n,
~ ( )=
~ X gk
k=l
~
() =
~ Xk 9
k=l
if X is the principal chamcter
0 otherwise.
'
(8.12)
if 9 is the neutml element
0 otherwise.
'
(8.13)
1F2' we have a nice symmetry: Xi(gj) = Xj(gi). Therefore,
the above formulae can be contracted to one formula in both groups. In Zn
it becomes
In groups Zn and
L
e
2"ixy
n
==
{
yEZn
and in
n, if x = 0
0, otherwise,
1F2' we have
"
(-l):Z:.Y
~
yEIF2'
= {20,m , otherWIse.
if X =?
8.2.3 Discrete Fourier Transform
We have now all the tools for defining the discrete Fourier transform: any
element f E V (recall that V is the vector space of functions G --+ C) has
unique representation with respect to basis B = {JnXI, ... , JnXn}:
f
=
hB1 + ... + fnBn.
(8.14)
f:
Definition 8.2.1. The function
G --+ C defined by
h
where
is the coefficient of Bi in the representation (8.14) is called the
discrete Fourier tmnsform of f.
Because B is an orthonormal basis, we can easily extract any coefficient fi
by calculating the inner product of Bi and (8.14):
(Bi I f)
n
n
j=l
j=l
= L(Bi I J;Bj ) = L
J;(B i I B j )
=K
8.2 Fourier Transforms
159
Thus, the Fourier transform can be written as
(8.15)
By its very definition, it is clear that
functions f, h and any c E C.
-r+h = 1+ Ii and -;;j = c1 for any
Example 8.2.3. An interesting corollary of the orthogonality relation (8.11)
can be easily derived:
IIill
2
=
(11 1) =
=
t; vn
1
n
1
n
n
L!*(9i)1(9i)
i=1
n
t;Xi(9k)f*(9k)
n
1
n
vn {;X:(91)f(91)
n
= - ~~f*(9k)f(91) ~Xi(9k)X:(9!)
n
1
= ;;:
k=11=1
L
n
i=1
f* (9k)f(9k)n = (J I J) = IIfll2.
k=1
The equation
IIill =
IIfil thus obtained is known as Parseval's identity.
Example 8.2.4. Let G =
are all of form
::En.
As we saw in Example (8.2.1), the characters of
::En
27ri:r:y
Xy(x) = e n ,
and, therefore, the Fourier transform of a function
~
f(x) =
f : ::En
~
C takes form
2",,,y
vn1 """"'
~ ef(y)·
n
yEZ"
Example 8.2.5. The characters of group G = IFr are
so the Fourier transform of f
i(x) =
~
L
v2··' lIEF2'
: IFr
~
C looks like
(-l)"'·lIf(y).
This Fourier transform in IFr is also called a Hadamard transform, Walsh
transform, or Hadamard- Walsh transform.
160
8. Appendix B: Mathematical Background
8.2.4 The Inverse Fourier Transform
Notice that Equation (8.15) can be rewritten as
(8.16)
and that the matrix appearing in the right-hand side of (8.16) is X*, the
transpose complex conjugate of matrix X defined as X ij = Xj(gi).6 By multiplying (8.16) by X we get
(8.17)
which gives the idea for the following definition.
Definition 8.2.2. Let f : G -+ C be a function. The inverse Fourier transform of f is defined to be
(8.18)
Keeping equations (8.16) and (8.17) in mind, it is clear that
f = 1= f.
Example 8.2.6. In lEn we have Xx(y) = Xy(x), so the inverse Fourier transform is quite symmetric to the Fourier transform:
!(x) =
~
L
e 2"~xy f(y).
yE'Zn
In group
-
f(x)
6
lFi
the symmetry is perfect:
=
/'i\ffi L.J (-l)""Yf(y)
1"
v 2···
Y
ElF'"
~
= f(x).
2
Equation (8.16) makes Parseval's identity even clearer: matrices
are unitary and, therefore, they preserve the norm in V.
JnX and JnX·
8.3 Linear Algebra
161
8.2.5 Fourier Transform and Periodicity
We will now give an illustration on how powerfully the Fourier transform can
extract information about the periodicity.
Let f : G -t C be a function with period pEG, i.e., f(g + p) = f(g) for
any 9 E G. Then
~
1 n
f(gi) = y'ri LX:(9k)f(9k)
k=l
1
n
= y'riLXi(9k+P-p)*f(9k+P)
k=l
= X:( -p)
In t
X:(gk
+ p)f(gk + p)
k=l
= X:( -p)f(gi) ,
which implies that f(gi)
= 0 whenever X: (-p) = Xi(-p)-IXi(P) =/:-1.
Example 8.2. 7. In group lEn the above equation takes form
-
f(x) = e-
27ri:x:pn
f(x),
•
-
2wia:p.
whIch means that f(x) = 0 whenever e- n =/:- 1, whIch happens exactly
when xp is not divisible by n. If also gcd(p, n) = 1, then -xp can be divisible
by n and is so if and only if x is.
8.3 Linear Algebra
8.3.1 Preliminaries
A vector space over complex numbers is an abelian group V (usually written
additively) that is equipped with scalar multiplication, which is a mapping
C x V -t V. The scalar multiplication is usually denoted by (c,x) H cx.
It is required that the following vector space axioms must be fulfilled for all
C,CI,C2 E C and X,XI,X2 E V:
1. CI(C2X) = (CIC2)X.
2. (CI + C2)X = CIX + C2X.
3. C(XI + X2) = CXI
4.1x = x.
+ CX2.
Again, to be precise, we should talk about vector space (V, +, 0, -, C,·) instead of space V, but to simplify the notations we identify the space and the
underlying set V. The elements of V are called vectors.
162
8. Appendix B: Mathematical Background
Example 8.3.1. Let V
en equipped with sum
= en,
the set of n-tuples over complex numbers. Set
zero element (0, ... ,0) and inversion
is evidently an abelian group. This set equipped with scalar multiplication
becomes a vector space, as is easily verified.
Example 8.3.2. The set of functions x : N
~
e,
for which the series
00
is convergent, also constitutes a vector space over Co The sum and scalar
product are again defined pointwise: for functions x and y, (x + y)(n) =
x{n) + y(n) and (cx)(n) = cx(n). To simplify the notations, we write also
Xn = x{n). To verify that the sum of two elements stays in the space, we
also have to check that the series L:~l IX n + Ynl 2 is convergent whenever
L:~=l IXnl2 and L:~=l IYnl 2 are convergent. For this purpose, we can use
estimations
Ix + Yl2 = IxI 2+ 2Re{x*y) + lyl2
:S Ixl2 + 21xllyl + lyl2:S 21xl2 + 21yl2
for each summand. The vector space of this example is denoted by £2(C).
Notice that the domain N in the definition could be replaced with any numerable set. If, instead of N, there is a finite domain {I, ... , n}, the convergence
condition would be unnecessary and the space would become essentially the
same as en.
Definition 8.3.1. A linear combination of vectors Xl, ... , Xn is a finite sum
+ ... + CnX n . A set S c V is called linearly independent if
C1Xl
implies Cl = ... = Cn = 0 whenever Xl,
independent is linearly dependent.
... , Xn
E
S. A set that is not linearly
Definition 8.3.2. For any set S ~ V, £(S) is the set of all linear combinations of vectors in S. By its very definition, set £(S) is a vector space
contained in V. We say that £(S) is generated by S and also call £(S) the
span of S.
8.3 Linear Algebra
163
The proof of the following lemma is left as an exercise.
Lemma 8.3.1. A set 8 <; V is linearly dependent if and only if some vector
x E 8 can be represented as a linear combination of vectors in 8 \ {x}.
In ligth of the previous lemma, a linearly dependent set contains some "unnecessary" elements in the sense that they are not needed when making
linear combinations. For, if x E 8 can be expressed as a linear combination
of 8\ {x}, then clearly L(8\ {x}) = L(8) (cf. Exercise 1).
Definition 8.3.3. A set Be V is a basis of V if V
independent.
= L(B) and B is linearly
It can be shown that all the bases have the same cardinality (See Exercise
2). This gives the following definition.
Definition 8.3.4. The dimension dim (V) of a vector space V is the cardinality of a basis of V.
Example 8.3.3. Vectors el
(0,0, ... , 1) form a basis of
has dimension n.
=
(1,0, ... ,0), e2
=
cn . This is a so-called
(0,1, ... ,0), ... , en =
natural basis. Thus C n ,
Example 8.3.4. As in the previous example, we could define ei E L 2 (C) by
I, if i = n
ei(n) = { 0, if i -# n
for each i E N. It is clear that the set e = {el, e2, e3, ... } is linearly independent, but it is not a basis of L 2 (C) in the sense of definition (8.3.3),
because there are vectors in L 2 (C) that cannot be expressed as a linear combination of vectors in e. This is because, according to definition (8.3.1), a
linear combination is a finite sum of vectors (Exercise 4).
Definition 8.3.5. A subset W <; V is a subspace of V if W is a subgroup
of V that is closed under scalar multiplication.
Example 8.3.5. Consider the n-dimensional vector space C n . Set
is clearly a subspace of
n-l.
cn having el,
... en-l as basis. Hence, dim(W) =
Example 8.3.6. Let
W = {x E L 2 (C)
I Xn -# 0 only for finitely many n EN}.
It is straightforward to see that W is a subspace of L 2 (C) and that the set
of example (8.3.4) is a basis of W.
e
164
8. Appendix B: Mathematical Background
8.3.2 Inner Product
A natural way to introduce geometry into a complex vector space V (a vector
space over C n ) is to define the inner product.
Definition 8.3.6. An inner product on V is a mapping Vx V -+ C, (x, y) H
(x I y) that satisfies the following conditions for any C E C and any x, y,
and z E V
1. (x
2. (x
3. (x
I y) = (x I y)*.
I x) ;::: 0 and (x I x) = 0 if and only if x =
I clY + C2Z) = Cl(X I y) + C2(X I z).
O.
In axiom 1 and hereafter, * means complex conjugation. Notice that by axiom
1 it follows that (x I x) is always real, so axiom 2 makes sense. From axioms
1 and 3 it follows that
A vector space equipped with an inner product is also called an inner product
space.
Example 8.3.7. For vectors x = (Xl, .. . , x n), and Y = (Yl,"" Yn) in
formula
(x I y)
cn
the
= XiYl + ... + x~Yn
defines an inner product, as is easily verified. An inner product of x and
y E £2(([;) can be defined by formula
(8.19)
since the series (8.19) is convergent. This is because
and the series L~=l IXn l2 and L~=l IXnl2 converge by the very definition of
£2(([;).
Definition 8.3.7. Vectors x and yare orthogonal if (x I y) = O. Two subsets Sl ~ V and S2 ~ V are mutually orthogonal if all Xl E Sl and X2 E S2
are orthogonal. A set S ~ V is orthogonal if Xl E Sand X2 E S are orthogonal whenever Xl =I X2
8.3 Linear Algebra
165
An orthogonal set that does not contain the zero vector 0 is always linearly
independent: for, if S is such a set and
is a linear combination of vectors in S, then
0) = 0, which implies Ci = 0 since Xi =1= o.
Example 8.3.8. Let W
Wl.
= {x
~
Ci(Xi
1 Xi)
= (Xi
1 CiXi)
= (Xi
1
V be a subspace. Then also
E V 1 (x 1 y)
= 0 for
all YEW}
is a subspace of V, the so-called orthogonal complement of W.
Lemma 8.3.2 (Cauchy-Schwartz inequality). For any vectors x, y E V,
I(x 1 y)1 2 S (x 1 x)(y 1 y).
Proof. If y = 0, then (x 10) = (x 10 + 0) = (x 1 0) + (x 1 0), so (x 10) = 0
and the claim holds with equality. If y =1= 0, we can define A = - i~I:;. Then,
by the inner product axiom 2,
Os (x + AY 1x + AY) = (x 1x) + A(X 1y) + A*(Y 1x) + IAI2 (y 1y),
which can also be written as
oS
(x 1 x) -
~: : :~ (x
1
y),
and the claim follows.
D
Definition 8.3.8. A norm on a vector space is a mapping V
such that, for all vectors x and y, and c E C we have
l·lIxll 2: 0 and IIxll = 0 if and only if x
2·lIcxll = Iclllxll·
3. IIx + yll S IIxll + lIyll·
=
~
JR., x
H
Ilxll
o.
A vector space equipped with a norm is also called a normed space . Any
inner product always induces a norm by
IIxll
= vTxTX),
so an inner product space is always a normed space as well. To verify that
vTxTX) is a norm, it is easy to check that the conditions 1 and 2 are fulfilled;
and for condition 3, we use the Cauchy-Schwartz inequality, which can also
be stated as I(x I y)1
IIxlillyli. Thus,
s
IIx
+ Yll2
= (x
s
+ y I x + y)
= (lIxll + IIY11)2.
+ 2Re(x I y) + lIyW
+ lIyl12 s IlxW + 211xlillyll + lIyll2
= IIxl12
II x ll 2 + 21(x I y)1
166
8. Appendix B: Mathematical Background
Any norm can be used to define the distance on V:
d(x, y)
= Ilx - YII·
One can easily verify that d : V x V -t lR satisfies the axioms of a distance
function:
For each x, y, and Z E V,
1 d(x,y)=d(y,x)::::O.
2 d(x, y) = 0 if and only if x
3 d(x, z) ::; d(x, y) + d(y, z).
= y.
A characteristic property of a normed space that is also an inner product
space is that
(8.20)
holds for any x, y E V (See Exercise 5). Equation (8.20) is called the Parallelogram rule.
Definition 8.3.9. Let V be a normed space. If, for each sequence of vectors
Xl, X2, ... such that
lim Ilxm
m,n-+oo
-
xnll
there exists a vector
X
lim Ilxn - xii =
m-too
= 0,
such that
0,
we say that V is complete.
A class of vectors space having great importance in quantum physics and
in quantum computation is introduced in the following definition.
Definition 8.3.10. An inner product space V is a Hilbert space if V is complete with respect to the norm induced by the inner product.
Example 8.3.9. Both en and L 2 (<<::) are Hilbert spaces, but the subspace W
of L 2 (<<::) in example (8.3.5) is not (cf. Exercise 6).
Definition 8.3.11. A vector space V is a direct sum of subspaces WI and
= WI EB W 2 , if V is a direct sum of the subgroups WI and
W2•
W 2 , denoted by V
The next lemma shows that Hilbert spaces are structurally well-behaving:
Lemma 8.3.3. If a subspace W of a Hilbert space V is itself W a Hilbert
space, then V = WEB W.L (see Example (8. 3. 8}}.
Definition 8.3.12. Let V and W be complex vector spaces. A mapping f :
V -t W is a vector space morphism if
f(clx
+ C2Y) =
cd(x)
+ c2!(y)
for each x, y E V and CI, C2 E
linear mappings or operators.
e. Vector space morphisms are also called
8.4 Number Theory
167
8.4 Number Theory
8.4.1 Euclid's Algorithm
It should be emphasized that the proof of Lemma 8.1.5 already supplies a
recursive method for computing the greatest common divisor of two given
natural numbers:
gcd(x - y, y), if x> y
gcd(x, y) = { gcd(x, y - x), ~f x < y
x,
If x = y.
It is clear that this method always gives gcd(x, y) correctly. However, the
method is not quite efficient: for instance, in computing gcd(x, 1) it proceeds
as
gcd(x, 1)
= gcd(x - 1,1) = gcd(x - 2, 1) = ... = gcd(l, 1) = 1,
so it takes x recursive calls to perform this computation. Keeping in mind that
the decimal representation of x has length £( x) = lloglO X J, the computational
time is proportional to lOl(x) , i.e., the current algorithm is exponential in £(x).
A more efficient method is given by Euclid's algorithm, whose core is given
in the following lemma.
Lemma 8.4.1. Given natuml numbeTs x > y. there are unique integeTs d
and T such that x = dy + T wheTe 0 :S T < y. NumbeTs y and d can be found
by using £(x)2 elementary arithmetic opemtions (multiplication, addition, and
subtmction of two digits).
Euclid's algorithm is based on consecutive application of the previous lemma:
given x > y, we can find representations
x=dIY+TI,
Y
Tl
= d2Tl + T2,
= d3T2 + T3,
Tn-2 = dnTn-1
Tn-I
+ Tn,
= dn+ITn .
(8.21 )
(8.22)
(8.23)
(8.24)
The above procedure certainly terminates with some Tn+! = 0, since Tl, T2,
... is a strictly descending sequence of non-negative numbers. We claim that
gcd(x, y) equals to Tn, the last nonzero remainder in this procedure. First, by
(8.24), Tn divides Tn-I. By (8.23) Tn also divides Tn-2. Continuing this way,
we see that Tn divides all the numbers Ti and also x and y. Therefore, Tn is
a common divisor of x and y. Assume, then, that there is another number
s that divides both x and y. By (8.21), s divides TI' By (8.22), s divides
168
8. Appendix B: Mathematical Background
r2 and again continuing this reasoning, we see that s eventually divides rn.
Therefore rn is the greatest common divisor of x and y.
How rapid is this method? Since each ri < x, we know that each iteration
step can be done in a time proportional to .e(x)2. Thus, it suffices to analyze
how many times we need to iterate in order to reach r n+1 = O. Because one
single iteration step gives an equation
where 0::; ri < ri-l, we have ri = ri-2 -diri-l < ri-2 -diri. The estimation
ri-2 > (1 + di)ri 2:: 2ri follows easily. Therefore,
x> y
> rl > 2r3 > 4r5 ... > 2ir2i+1'
n-l
n-l
If n is odd, we have x > 2-rrn 2:: 2-r. For an even n, we also have
n-2
n-2
n-2
X > 2-rrn _l > 2-rr n , so the inequality x > 2-2- holds in any case. It
follows that .e(x) = lloglO xJ > 10glO X - 1 > n2'2loglO 2 - 1. Therefore,
n<2I2 (.e(x) + 1) + 2 = O(.e(n)). As a conclusion, we have that, for numoglO
bers x > y, gcd(x, y) can be computed using O(.e(x)3) elementary arithmetic
operations.
8.4.2 Continued Fractions
Example 8.4.1. Let us apply Euclid's algorithm on pair (263,189):
263 = 1·189 + 74,
189 = 2·74 + 41,
74=1·41+33,
41 = 1· 33 + 8,
33 = 4·8 + 1,
and the algorithm terminates at the next round giving gcd(263, 189)
divisions we obtain
263
74
189 = 1 + 189'
189 _ 2 41
74 - + 74'
74
33
41 = 1 + 41'
41
8
33 = 1 + 33'
33
1
"8 = 4+ 8'
= 1. By
a set of equations such that the fraction occurring on the right-hand side is
the inverse of the fraction on the left-hand side of the next equation. We can
thus combine these equations to get a representation
8.4 Number Theory
263
-
189
= 1+
1
2+
1
169
(8.25)
__--:;-__
1
1+
1
1+--1
4+ 8
Expression (8.25) is an example of a finite continued fraction.
It is clear that this procedure can be done for any positive rational number
0: = ~ 2: 1 in the lowest terms.
For other values of 0:, there is a unique way to express 0: = ao + {3 where
ao E Z and (3 E [0,1). If {3 f. 0, this can also be written as
0:
1
= ao + -,
0:1
where
0:1
1
= -{3 > l.
By applying the same procedure recursively to
0:
0:1,
we get an expansion
1
= ao + - - - - - . 1 - - al
(8.26)
+ -----:;1,-----
where ao E Z, and aI, a2, ... are natural numbers. If 0: is an irrational
number, then the sequence ao, aI, a2, ... never stops (otherwise 0: would be
a rational number). Mainly for typographical reasons, we write (8.26) as
ao E Z, ai E N, for i 2: 1,
(8.27)
and say that (8.27) is the continued fraction expansion of 0:. It is clear by its
very construction that each irrational 0: has unique infinite continued fraction
expansion (8.27).
On the other hand, all rational numbers r have a finite continued fraction
expansion
ao E Z, ai E N for 1
~
n
(8.28)
that can be found by using Euclid's algorithm. But the expansion (8.28) is
never unique, as is shown by the following lemma.
Lemma 8.4.2. If r has expansion (8.28) of odd length, the r also has expansion (8.28) of even length and vice versa.
Proof. This follows straightforwardly from the identity
lao, al,···, an-I, an] = lao, al,···, an-l
1
+ -].
an
If an 2: 2, then lao, al,"" an] = lao, al ... , an - 1,1]. If an
lao, al,···, an-I, 1] = lao, al," ., an-2, an- l + 1].
1, then
o
170
8. Appendix B: Mathematical Background
Example 8.4.2. The expansion (8.25) can be written in two ways:
263
189 = 1 +
1
1
2+
1
1+
1
= 1+
2+
1
1+--1
4+ 8
1
1
1+
1
1+
1
4+-1
7+ 1
which illustrates how we can either lengthen or shorten any finite continued
fraction expansion by 1. It can, however, be shown that if a finite continued
fraction expansion is required to end up by an > 1, then the representation
is unique.
Clearly, each finite continued fraction represents some rational number.
But even though we now know that each irrational number has unique representation as an infinite continued fraction, we cannot tell a priori if each
sequence
where ao E Z and ai E N, when i 2 1
ao, at. a2, ... ,
(8.29)
regarded as a continued fraction represents an irrational number. In other
words, we do not yet know if the limit
(8.30)
exists for each sequence (8.29). We will soon find an answer to that question.
Let (ai) be a sequence as in (8.29). The nth convergent of the sequence
(ai) is defined to be fu
= lao, ... ,an]. A simple calculation shows that
q..
Po
ao
qo
1
PI
aOal + 1
=
ql
al
-=-,
c!;) + 1
P2
aO(al +
- =
q2
al +
I
a2
=
a2PI + Po
a2qI
+ qo
Continuing this way, we see that Pn and qn are polynomials that depend on
ao, at. ... an. We are now interested in finding the polynomials Pn and qn,
so we will regard ai as indeterminates, assuming that they do not take any
specific integer values. After calculating some first Pn and qn, we may guess
that there are general recursion formulae for polynomials Pn and qn·
Lemma 8.4.3. For each n 2 2,
+ Pn-2,
anqn-I + qn-2·
Pn = anPn-1
qn =
hold for each n
2 2.
(8.31 )
(8.32)
8.4 Number Theory
171
Proof. By induction on n. The formulae (8.31) and (8.32) are true for n = 2,
as we saw before. Assume then that they hold for numbers {2, ... ,n}. Then,
Pn+1
- = [ao,···, an-b an, an+l 1= [ao,···, an-b an + -1- 1
qn+l
=
an+l
6
+
)Pn-l + Pn-2
I
(an + -a-)qn-l
+ qn-2
n+l
(an
an+1(anPn-1 + Pn-2)
an+1(anqn-1 + qn-2)
_ an+IPn + Pn-l
,
an+1 qn + qn-l
=
+ Pn-l
+ qn-l
o
so the formulae (8.31) and (8.32) are valid for each n.
If aI, a2, ... are natural numbers, then it follows by (8.32) that qn >
qn-l + qn-2 and the inequality qn ~ Fn , where Fn is the nth Fibonacci
number,7 can be proved by induction. In the sequel we will use notation Pn
and qn also for the Pn and qn evaluated at {ao, ab ... , an}. The meaning of
individual notation Pn or qn will be clear by the context.
Using the recursion formulae, it is also easy to prove the following lemma:
Lemma 8.4.4. For any n ~ 1, Pnqn-l - Pn-Iqn
= (-It-I.
From the previous lemma it also follows that the convergents E.!!. are always
in their lowest terms: for if d divides both Pn and qn, then d al~o divides 1.
Lemma 8.4.4 has even more interesting consequences: By multiplying
(8.33)
by an and using the recursion formulae (8.31) and (8.32) once more, we see
that
(8.34)
whenever n
~
2. Equations (8.33) and (8.34) can be rewritten as
Pn
Pn-l
qn
qn-l
(_I)n-1
qnqn-l
(8.35)
and
(-I)na n
Pn
Pn-2
----=
qn
qnqn-2
qn-2
(8.36)
Equation (8.35) implies inequality
7
Fibonacci numbers Fn are defined by Fo
= H = 1 and by Fn = Fn- 1 + Fn-2.
8. Appendix B: Mathematical Background
172
P2n
q2n
<
P2n-l ,
q2n-l
which, together with (8.36), implies that
P2n-2
q2n-2
<
<
P2n
q2n
P2n-l
q2n-l
<
P2n-3.
q2n-3
(8.37)
Inequalities (8.37) show that the sequence of even convergents is strictly
increasing but bounded above by the sequence of odd convergents which is
strictly decreasing. It follows by (8.35), and by the fact that qn tends to
infinity (recall that qn ? Fn), that both of the sequences converge to a
limit a. Therefore, each sequence (8.29) represents some irrational number
a. Moreover, since the limit a is always between two consecutive convergents,
(8.35) implies that
pn- a 1 <---<2".
1
1
-
I
- qnqn+l
qn
(8.38)
qn
Example 8.4.3. The continued fraction expansion of 11" begins with
11"
= [3,7,15,1,292,1,1, ... J.
Since q4 = 113 and q5 = 292 . q4 + q3 = 33102, ~ = ~~~ is a very good
approximation of 11" with a relatively small nominator. By (8.38),
1
355
113 -
11"
1
:::;
1
113.33102 = 0.000000267342 ....
In fact,
355 -
1 113
11"
1
= 0.000000266764 ... ,
so the estimation given by (8.38) is also quite precise.
Remark 8.4.1. Inequality (8.38) tells us that, for an irrational a, the convergents fu
give infinitely many rational approximations l?q with gcd(p, q) = 1
h
of a so precise that
(8.39)
This has further number theory interest, since the rational numbers have only
finitely many rational approximations (8.39) such that gcd(p, q) = 1. This is
true because, if a = ~ with s > 0, then for q ? s and ~ =1= ~,
~ - ~I =
Iq
s
1 ps
- qr I ?
qs
-.!..
qs
?
-.!...
q2
8.4 Number Theory
173
For 0 < q < s there are clearly only finitely many approximations ~ that
satisfy (8.39) and gcd(p, q) = l.
Thus, the irrational numbers can be characterized by the property that
they have infinitely many rational approximations (8.38). The continued fraction expansion also gives finitely many rational approximations (8.39) for
rational numbers. On the other hand, we will show in a moment (Theorem
8.4.3) that approximations which are good enough are convergents.
It can even be shown that the convergents to a number a are the "best"
rational approximations of a in the following sense. For the proof of the
following theorem, consult [33J.
Theorem 8.4.1. Let ~ be a convergent to a. If 0 < q ~ qn and ~ =I- ~,
then
whenever n
~
2.
Now we will show that approximations which are good enough are convergents. To do that, we must first derive an auxiliary result. Let a =
lao, ab a2 ... J be a continued fraction expansion of a. We define an+l =
[an+b a n+2, ... J and this gives us a formal expression
This expression is not necessarily a continued fraction in the sense that an
need not be a natural number, but anyway it gives us a representation
~
1
(8.40)
with Pnqn-l - Pn-lqn = (_l)n-l just like in the proof of Lemma 8.4.3. But
also any representation looking "enough" like (8.40) implies that !2!.
and Pn-l
qn
qn-l
are convergents to a. More precisely:
Theorem 8.4.2. Let P, Q, P', and Q' be integers such that Q
and PQ' - QP' = ±l. If also a' ~ 1 and
> Q' > 0
a'P+P'
a = --:----:::a'Q + Q"
then QP:
ofa.
=
Pn-l
qn-l
and EQ
= !2!.
are convergents of a continuedfruction expansion
qn
174
8. Appendix B: Mathematical Background
Proof. Anyway, we have a finite continued fraction expansion for
P
Pn
-.
Q = [ao,al ... ,an] =qn
(8.41)
By Lemma 8.4.2, we can choose n to be even or odd, so we can assume,
without loss of generality, that
PQ' - QP' = (-It-I.
(8.42)
Also by Lemma 8.4.4, we have gcd(Pn' qn) = 1 and, similarly, gcd(P, Q) = 1.
Since Q and qn have the same signs (Q is positive by assumption), it follows
by (8.41) that P = Pn and Q = qn' Using this knowledge and Lemma 8.4.4,
equation (8.42) can be rewritten as
or even as
(8.43)
Since gcd(Pn,qn) = 1, it follows by (8.43) that qn divides Q' - qn-I.
On the other hand, Q' - qn-I < Q' < qn since qn = Q > Q' > 0 by assumption. Also qn-I - Q' < qn-I < qn and, by combining these observations,
we see that
which can be valid only if Q'
Hence,
= qn-I. By (8.43) it also follows that P' = Pn-l.
which can also be written as
(8.44)
just like in the proof of Lemma 8.4.3. Since a.' 2: 1, there is a continued
0
fraction expansion a.' = [an+! , an+2,"'] such that an+! 2: 1.
Now we are ready to conclude this section by
Theorem 8.4.3. If
0<
then
~
IEq - 0'.1 <- _1_
2q2'
is a convergent to the continued fraction expansion of a..
8.5 Shannon Entropy and Information
175
Proof. By assumption,
()
p
--a-aq
q2'
for some 0 < () ~ ~ and a = ±l. Let ~ = lao, al, ... , an] be the continued
fraction expansion for ~. Again by Lemma 8.4.2, we may assume that a =
(_1)n-l. We define
where fu
= P.q and
qn
Thus, we have
Pn-l
qn-l
are the last and the second-last convergents to
E..
q
and
Thus,
, - 1 qn-l
2 - 1 -- 1.
a----->
()
qn
By Theorem 8.4.2,
pn-l
qn-l
and
fu = I!.
qn
q
are consecutive convergents to a.
D
8.5 Shannon Entropy and Information
The purpose of this section is to represent, following [14], the very elementary concepts of information theory. In the beginning, we concentrate on
information represented by binary digits: bits.
8.5.1 Entropy
By using a single bit we can represent two different configurations: the bit is
either 0 or 1. Using two bits, four different configurations are possible, three
bits allow eight configurations, etc. Inspired by these initial observations, we
say that the elementary binary entropy of a set with n elements is log2 n. The
elementary binary entropy, thus, approximately measures the length of bit
strings which we need to label all the elements of the set.
176
8. Appendix B: Mathematical Background
Remark 8.5.1. Consider the problem of specifying one single element out of
given n elements such that the probability of being the outcome is ~ for
each element. The elementary binary entropy log2 n thus measures the initial
uncertainty in bits, i.e., to specify the outcome, we have to gain log2 n bits
of information. In other words, we have to obtain the binary string of log2 n
bits that labels the outcome.
On the other hand, using an alphabet of 3 symbols, we would say that the
elementary ternary uncertainty of a set of n elements is log3 n. To get a more
unified approach, we define the elementary entropy of a set of n elements to
be
H(n)
= Klogn,
where K is a constant and log is the natural logarithm. Choosing K = lo~ 2
gives the elementary binary information, K = lo~ 3 gives the ternary, etc.
However, it feels well justified to say that the information needed in average is less than K log n if the elements appear as outcomes with non-uniform
probability distribution. For example, if we have a priori knowledge that some
element does not appear at all, we can say that the information needed, i.e.,
the initial uncertainty of the outcome, is at most K log( n - 1). How do we
deal with a set of n elements, each appearing with a known probability of
Pi?'
Assume that we choose l times an element of a set having n elements
with uniform probability distribution. This experiment can also be viewed as
specifying a string of length lover an n-Ietter alphabet. There are n l such
strings, and since the letters appear with equal probabilities, the strings do
as well. Thus, the elementary entropy of the string set is
H(nl)
= K log n l = l· Klogn = l· H(n),
which is exactly what we could expect: the elementary entropy of strings
length l is l times the elementary entropy of a single letter. We use a similar
idea to handle the non-uniform probability distributions.
Let us choose l elements of an n-element set having probability distribution Pl, ... , Pn. Therefore, with large enough l, the ith letter in any such
string should appear approximately k i = Pil times. Let us study a simplified
case where each ki = Pil is an integer. A simple combinatorial calculation
shows that there are exactly
l!
kl!···k n !
strings where the ith letter occurs ki = Pil times. Moreover, the distribution of such strings tends to uniform distribution as l tends to infinity. The
elementary entropy of the set of such strings is
8.5 Shannon Entropy and Information
l!
K log k 1 ! ... k n !
= K(log l!
177
-log kl! - ... -log kn!).
By using log k! = k log k - k + O(log k) and l = kl
the average entropy (per letter) as
+ ... + kn,
we can write
H(Pl, ... ,Pn)
1
= T· K(llogl-l + O(logl)
:L (ki log ki n
-
ki
+ O(logki )))
i=1
1
= K
n
T(:L k i log l + O(log l)
n
-
i=1
:L ki log ki + o (log l))
i=1
n
ki O( -log l)
-_ - K:L -ki 1og-+
l
l
l
i=1
~
logl
= -K ~Pi log Pi + O(-l-)·
i=1
By letting l -+
definition.
00,
the last term tends to
o.
This encourages the following
Definition 8.5.1. Suppose that the elements of an n-element set occur with
probabilities of PI, ... , Pn· The Shannon entropy of distribution PI, ... , Pn is
defined to be
H(Pl, ... ,Pn) = -K(PllogPl
+ .. ·Pn logPn).
Remark 8.5.2. Because limp-toplogp
H(pl, ... ,Pn) ;::: o.
Remark 8.5.3. If K
=
= 0,
we define 0 . logO
= o.
Clearly,
in the above definition, we talk about binary
Shannon entropy. The special case that Pi = ~ for each i yields H(n) =
- K n . ~ log ~ = K log n, again giving the elementary entropy, which we
could now call uniform entropy as well .
lo! 2
Remark 8.5.4. Recall that the elementary binary entropy measures how
many bits of information we have to receive in order to specify an element
in a set of n elements with uniform probability distribution. In other words,
it measures the uncertainty of outcome in bits. On the other hand, binary
Shannon entropy measures the average uncertainty in bits in a set with a
probability distribution PI, ... , Pn·
Since log is a concave function,8 it follows easily that for any probability
distribution PI, ... , Pn, inequality
8
A real function f is concave in an interval I if the graph of f is above the chord
connecting any points f(XI) and f(X2), Xl, X2 E I. Formally, a concave function
f must satisfy f(>"XI + (1 - >")X2) ~ >..f(XI) + (1- >..)f(X2) for>.. E [0,1] and Xl,
X2 E I.
178
8. Appendix B: Mathematical Background
Pllog Xl
+ ... + Pn log Xn
~ log(P1Xl
+ ... + Pnxn)
holds true. Therefore,
H(Pl,'" ,Pn) = Pllogpl l
~ log(P1Pl l
+ ... + Pn logp~l
+ ... + Pnp~l) = logn.
Notice that the Shannon entropy H(Pl,'" ,Pn) also attains the above upper
bound at Pl = ... = Pn = ~. Moreover, Shannon entropy has the following
properties:
1. H (Pl, ... ,Pn) is a symmetric continuous function.
2. H ( ~, ... , ~) is a non-negative, strictly increasing function of variable n.
3. If (Pl, ... ,Pn) and (ql, ... ,qm) are probability distributions, then
H(Pl,'" ,Pn)
+ PnH(ql,""
qm)
= H(Pl,'" ,Pn-l,Pnql,'" ,Pnqm)'
We will make the definition of Shannon entropy even more natural by showing
that the converse also holds:
Theorem 8.5.1. If H(Pl,'" ,Pn) satisfies the conditions 1-3 above, then
H (Pl, ... ,Pn) = - K (Pl log Pl + ... + Pn log Pn) fOT some positive constant K.
Proof. We will first prove an auxiliary result that will be of great help: if
f is a strictly increasing, non-negative function defined on natural numbers
such that f(sm) = mf(s), then f(s) = Klogs for some positive constant K.
To prove this, we take an arbitrary natural number n and fix m such that
sm ~ 2n < sm+l. Then
m
log 2
1
m
-n -< - < -+-.
logs
n
n
(8.45)
Because f is strictly increasing, we have also f(sm) < f(2 n ) < f(sm+l),
which implies that
< f(2) < m
n - f(8)
n
m
+ ~.
n
(8.46)
Inequalities (8.45) and (8.46) together imply that
I f(2)
f(s)
_lOg21 <~.
logs - n
Since n can be chosen arbitrarily great, we must have
f (s) = fo~2~ log s =
K log s. Constant K = fo~2~ is positive, since f is strictly increasing and
f(l) = f(1°) = Of (1) = o.
For any natural number n, we define f (n) = H (~, ... , ~). We will demonstrate that f(sm) = mf(s). Symmetry of H and condition 3 imply that
8.5 Shannon Entropy and Information
1
1
11
11
179
1
H(-,
... , ) = H(-,
... ,-)
+s· -H(-l'
-l)'
sm
sm
s
S
S
sm- ... ' smwhich is to say that f(sm) = f(s) + f(sm-l). Claim f(sm) = mf(s) follows
now by induction. Since f is strictly increasing and non-negative by 2, it
follows that f(s) = Klogs for some positive constant K.
Let Pi, ... , Pn be some rational numbers such that Pl + ... + Pn = 1. We
can assume that Pi = 71, where N = ml + ... + m n . Using 3 we get
1
1
= H(N'···' N)
111
1
= H(Pl· -,···,Pl· -,P2· -,.··,···,···,Pn·-)
,
ml
v
ml
m2
' ------......--
mn
~
Hence,
n
= -K(LPilogmi -logN)
i=l
n
n
= -K(LPilogmi - LPilog N )
i=l
n
i=l
= -KLPilogpi,
i=l
which demonstrates that the theorem is true for rational probabilities. Since
H is continuous, the result extends to real numbers straightforwardly.
0
8.5.2 Information
We will hereafter ignore the constant K in the definition of entropy.
Let X = {Xl, ... ,xn} be a set of n elements, and P(Xl)' ... , p(xn) the
corresponding probabilities of elements to occur. Similarly, let Y be a set
with m elements with a probability distribution p(yt}, ... , P(Ym). Such sets
are here identified with discrete random variables: we regard X as a variable
which has n potential values Xl, ... , Xn, any such with probability P(Xi).
The joint entropy of sets (random variables) X and Y is simply defined
as the entropy of set X x Y, where the corresponding probability distribution
is the joint distribution P(Xi, Yj):
180
8. Appendix B: Mathematical Background
n
m
H(X, Y) = - L LP(Xi, Yj) logp(xi' Yj)·
i=l j=l
The conditional entropy of X, when the value of Y is known to be Yj, is
defined by merely replacing the distribution P(Xi) with conditional probability
distribution p(xiIYj):
n
H(XIYj) = - LP(XiIYj) logp(xiIYj).
i=l
The conditional entropy of X when Y is known is defined as the expected
value
m
H(XIY) = LP(Yj)H(XIYj).
j=l
By using the concavity of the function log, one can quite easily prove the
following lemma.
Lemma 8.5.1. H(XIY) :::; H(X).
Remark 8.5.5. The above lemma is quite natural: it states that the knowledge
of Y cannot increase the uncertainty about X.
Conditional and joint entropy are connected in the following lemma, which
has a straightforward proof if one keeps in mind that p(Xi' Yj) = p(xilYj )p(Yj)·
Lemma 8.5.2. H(XIY) = H(X, Y) - H(Y).
Finally, we have all the necessary tools for defining the information of X
when Y is known.
Definition 8.5.2. I(XIY) = H(X) - H(XIY).
Definition (8.5.2) contains the natural idea that the knowledge that Y can
provide about X is merely the uncertainty of X minus the uncertainty of X
provided Y is known. According to Lemma (8.5.1), I(XIY) 2: 0 always, and
trivially I(XIY) :::; HeX).
To end this section, we list some properties of entropy and information.
The proofs of the following lemmata are left as exercises.
Lemma 8.5.3. H(X, Y) :::; HeX)
+ HeY).
Lemma 8.5.4. Information is symmetric: IeXIY)
= I(YIX).
The above lemma justifies the terminology mutual information of X and Y.
For the following lemma, we need some extra terminology: if X and Y
are random variables, we say that X (randomly) depends on Y if there are
conditional probabilities p(xiIYj) such that
m
p(Xi) = LP(XiIYj)p(Yj).
j=l
8.6 Exercises
181
Lemma 8.5.5. Let X be a random variable that depends on Y, which is a
random variable depending on Z. If X is conditionally independent of Z, then
J(ZIX) :S J(ZIY).
8.6 Exercises
1. Prove that a set S <;; V is linearly dependent if and only if for some
element XES, x E L(S\ {x}).
2. Show that if Band B' are two bases of a H n , then necessarily IBI = IB'I.
3. a) Let n ~ 2 and Xl, X2 E Hn. Show that y E Hn can be chosen in
such a way that L(XI' y) = L(XI' X2) and (Xl I y) = O.
b) Generalize a) into a procedure for finding an orthonormal basis of
Hn (the procedure is called the Gram-Schmidt method).
4. Prove that function f : N ~ C defined by f(n) = ~ is in L 2 (C) but
cannot be expressed as a linear combination of E = {e I, e2, e2, ... }.
5. Prove the parallelogram rule: in inner product space V, equation
holds for any x, y E V.
6. Show that the subspace W of L 2 (C) generated by vectors {el' e2, e3, .. . }
is not a Hilbert space (cf. Example 8.3.4).
7. Prove Lemma 8.4.1.
8. Prove Lemma 8.4.4.
9. Prove Lemmata 8.5.3 and 8.5.4.
10. Use the properties of the function log to prove Lemma 8.5.5.
References
1. Andris Ambainis: A note on quantum black-box complexity of almost all Boolean
functions, Information Processing Letters 71, 5-7 (1999). Electronically available at quant-ph/9811080. 9
2. E. Bach and J. Shallit: Computational number theory, MIT Press (1996).
3. Adriano Barenco, Charles H. Bennett, Richard Cleve, David P. DiVincenzo,
Norman Margolus, Peter Shor, Tycho Sleator, John Smolin, and Harald Weinfurter: Elementary gates for quantum computation, Physical Review A 52:5,
3457-3467 (1995). Electronically available at quant-ph/9503016.
4. R. Beals, H. Buhrman, R. Cleve, M. Mosca, and R. de Wolf: Quantum lower
bounds by polynomials, Proceedings of the 39th Annual IEEE Symposium on
Foundations of Computer Science - FOCS, 352-361 (1998). Electronically available at quant-ph/9802049.
5. P. A. Benioff: Quantum mechanical Hamiltonian models of discrete processes
that erase their own histories: application to Turing machines, International
Journal of Theoretical Physics 21:3/4,177-202, (1982).
6. Charles H. Bennett: Logical reversibility of computation, IBM Journal of Research and Development 17, 525-532 (1973).
7. Charles H. Bennett: Time/space trade-offs for reversible computation, SIAM
Journal of Computing 18, 766-776 (1989).
8. Charles H. Bennett, Ethan Bernstein, Gilles Brassard, and Umesh V. Vazirani:
Strengths and weaknesses of quantum computation, SIAM Journal of Computing
26:5, 1510-1523 (1997). Electronically available at quant-ph/9701001.
9. Ethan Bernstein and Umesh Vazirani: Quantum complexity theory, SIAM Journal of Computing 26:5, 1411-1473 (1997).
10. Andre Berthiaume and Gilles Brassard: Oracle quantum computing, Proceedings of the Workshop on Physics and Computation - PhysComp'92, IEEE Press,
195-199 (1992).
11. Michel Boyer, Gilles Brassard, Peter Hoyer, and Alain Tapp: Tight bound
on quantum searching, Fourth Workshop on Physics and Computation PhysComp'96, Ed.: T. Toffoli, M. Biaford, J. Lean, New England Complex
Systems Institute, 36-43 (1996). Electronically available at quant-ph/9605034.
12. Gilles Brassard and Peter Hoyer: An exact quantum polynomial-time algorithm
for Simon's problem, Proceedings of the 1997 Israeli Symposium on Theory of
Computing and Systems - ISTCS'97, 12-23 (1997). Electronically available at
quant-ph/9704027.
13. Gilles Brassard, Peter Hoyer, and Alain Tapp: Quantum counting, Automata,
Languages and Programming, Proceedings of the 25th International Colloquium, ICALP'98, Lecture Notes in Computer Science 1443, 820-831, Springer
(1998). Electronically available at quant-ph/9805082.
9
Code "quant-ph/9811080" refers to http://xxx.lanl.gov/absfquant-ph/9811080
at Los Alamos preprint archive.
184
References
14. Leon Brillouin: Science and information theory, 2nd edition, Academic Press
(1967).
15. Paul Busch, Pekka J. Lahti, and P. Mittelstaedt: The quantum theory of measurement, Springer-Verlag, 1991.
16. A. R. Calderbank, Peter W. Shor: Good quantum error-correcting codes exist, Physical Review A 54, 1098-1105 (1996). Electronically available at quantph/9512032.
17. Man-Duen Choi: Completely positive linear maps on complex matrices, Linear
Algebra and its Applications 10, 285-290 (1975).
18. Isaac L. Chuang, Lieven M. K. Vandersypen, Xinlan Zhou, Debbie W. Leung,
and Seth Lloyd: Experimental realization of a quantum algorithm, Nature 393,
143-146 (1998). Electronically available at quant-ph/9801037.
19. Juan I. Cirac and Peter Zoller: Quantum computations with cold trapped ions,
Physical Review Letters 74:20, 4091-4094 (1995).
20. Henri Cohen: A course in computational algebraic number theory, Graduate
Texts in Mathematics 138, Springer (1993), 4th printing 2000.
21. Win van Dam: Two classical queries versus one quantum query, electronically
available at quant-ph/9806090.
22. David Deutsch: Uncertainty in quantum measurements, Physical Review Letters 50:9, 631-633 (1983).
23. David Deutsch: Quantum theory, the Church- Turing principle and the universal
quantum computer, Proceedings of the Royal Society of London A 400, 97-117
(1985).
24. David Deutsch: Quantum computational networks, Proceedings of the Royal
Society of London A 425, 73-90 (1989).
25. David Deutsch, Adriano Barenco, and Artur Ekert: Universality in quantum
computation, Proceedings of the Royal Society of London A 449, 669-677 (1995).
Electronically available at quant-ph/9505018.
26. David Deutsch, Richard Jozsa: Rapid solutions of problems by quantum computation, Proceedings of the Royal Society of London A 439, 553-558 (1992).
27. Chistoph Durr and Peter Hoyer: A quantum algorithm for finding the minimum,
electronically available at quant-ph/9607014.
28. Mark Ettinger and Peter Hoyer: On quantum algorithms for noncommutative
hidden subgroups, Proceedings of the 16th Annual Symposium on Theoretical
Aspects of Computer Science - STACS 99, Lecture Notes in Computer Science
1563, 478-487, Springer (1999). Electronically available at quant-ph/9807029.
29. Edward Farhi, Jeffrey Goldstone, Sam Gutmann, and Michael Sipser: A limit
on the speed of quantum computation in determining parity, Physical Review
Letters 81:5, 5442-5444 (1998). Electronically available at quant-ph/9802045.
30. Richard P. Feynman: Simulating physics with computers, International Journal
of Theoretical Physics 21:6/7, 467-488 (1982).
31. Lov K. Grover: A fast quantum-mechanical algorithm for database search, Proceedings of the 28th Annual ACM Symposium on the Theory of Computing STOC, 212-219 (1996). Electronically available at quant-ph/9605043.
32. Josef Gruska: Quantum Computing, McGraw-Hill (1999).
33. G. H. Hardy and E. M. Wright: An introduction to the theory of numbers, 4th
ed. with corrections, Clarendon Press, Oxford (1971).
34. Mika Hirvensalo: On quantum computation, Ph.Lic. Thesis, University of
Turku, 1997.
35. Mika Hirvensalo: The reversibility in quantum computation theory, Proceedings of the 3rd International Conference Developments in Language Theory DLT'97, Ed.: Symeon Bozapalidis, Aristotle University of Thessaloniki, 203-210
(1997).
References
185
36. Loo Keng Hua: Introduction to number theory, Springer-Verlag, 1982.
37. E. Knill, R. Laflamme, R. Martinez, C.-H. Tseng: An algorithmic benchmark
for quantum information processing, Nature 404: 368-370 (2000).
38. Rolf Landauer: Irreversibility and heat generation in the computing process,
IBM Journal of Research and Development 5, 183-191 (1961).
39. M. Y. Lecerf: Recursive insolubilite de l'equation generale de diagonalisation
de deux monomorphismes de monoi·des libres 'fiX = "px, Comptes Rendus de
l'Academie des Sciences 257,2940-2943 (1963).
40. Ming Li, John Tromp and Paul Vitanyi: Reversible simulation of irreversible
computation, Physica D 120:1/2, 168-176 (1998). Electronically available at
quant-ph/9703009.
41. Seth Lloyd: A potentially realizable quantum computer, Science 261,1569-1571
(1993).
42. Hans Maassen and J. B. M. Uffink: Generalized entropic uncertainty relations,
Physical Review Letters 60:12, 1103-1106 (1988).
43. F. J. MacWilliams and Neil J. A. Sloane: The theory of error-correcting codes,
North-Holland (1981).
44. Gary L. Miller: Riemann's hypothesis and tests for primality, Journal of Computer and System Sciences 13, 300-317 (1976).
45. Michele Mosca and Artur Ekert: The hidden subgroup problem and eigenvalue
estimation on a quantum computer, Quantum Computing and Quantum Communications, Proceedings of the 1st NASA International Conference, Lecture
Notes in Computer Science 1509,174-188, Springer (1998). Electronically available at quant-ph/9903071.
46. A. J. Menezes, P. C. van Oorschot, and S. A. Vanstone: Handbook of applied
cryptography, Series on Discrete and Mathematics and Its Applications, CRC
Press (1997).
47. Masanao Ozawa: Quantum Turing machines: local transitions, preparation,
measurement, and halting problem, electronically available at quant-ph/9809038
(1998).
48. Christos H. Papadimitriou: Computational complexity, Addison-Wesley (1994).
49. K. R. Parthasarathy: An introduction to quantum stochastic calculus, Birkhauser, Basel (1992).
50. R. Paturi: On the degreee of polynomials that approximate symmetric Boolean
functions, Proceedings of the 28th Annual ACM Symposium on the Theory of
Computing - STOC, 468-474 (1992).
51. Max Planck: Annalen der Physik 1, 69, 1900; Verhandlg. dtsch. phys. Ges., 2,
202; Verhandlg. dtsch. phys. Ges. 2, 237; Annalen der Physik 4, 553, 1901.
52. M. B. Plenio and P. L. Knight: Realistic lower bounds for the factorization time
of large numbers on a quantum computer, Physical Review A 53:5, 2986-2990
(1996). Electronically available at quant-ph/9512001.
53. E. L. Post: The two-valued iterative systems of mathematical logic, Princeton
University Press (1941).
54. E. L. Post: A variant of a recursively unsolvable problem, Bulletin of the American Mathematical Society 52, 264-268 (1946).
55. John Preskill: Robust solutions to hard problems, Nature 391, 631-632 (1998).
56. Marcel Riesz: Sur les maxima des formes bilineaires et sur les fonctionnelles
lineaires, Acta Mathematica 49, 465-497 (1926).
57. Yurii Roghozin On the notion of universality and small universal Turing machines, Theoretical Computer Science 168, 215-240 (1996).
58. Sheldon M. Ross: Introduction to probability models, 4th edition, Academic
Press (1985).
186
References
59. J. Barkley Rosser and Lowell Schoenfeld, Approximate formulas for some functions of prime numbers, Illinois Journal of Mathematics 6:1, 64-94 (1962).
60. Walter Rudin: Functional Analysis, 2nd edition, McGraw-Hill (1991).
61. Keijo Ruohonen: Reversible machines and Post's correspondence problem for
biprefix morphisms, ElK - Journal of Information Processing and Cybernetics
21:12, 579-595 (1985).
62. Arto Salomaa: Public-key cryptography, Texts in Theoretical Computer Science
- An EATCS Series, 2nd ed., Springer (1996).
63. Peter W. Shor: Algorithms for quantum computation: discrete log and factoring,
Proceedings of the 35th Annual IEEE Symposium on Foundations of Computer
Science - FOCS, 20-22 (1994).
64. Peter W. Shor: Scheme for reducing decoherence in quantum computer memory,
Physical Review A 52:4, 2493-2496 (1995).
65. Daniel R. Simon: On the power of quantum computation, Proceedings of the
35th Annual IEEE Symposium on Foundations of Computer Science - FOCS,
116-123 (1994).
66. Douglas R. Stinson: Cryptography - Theory and practice, Series on Discrete
Mathematics and Its Applications, CRC Press, Boca Raton (1995).
67. W. Tittel, J. Brendel, H. Zbinden, N. Gisin: Violation of Bell inequalities by
photons more than 10 km apart, Physical Review Letters 81:17, 3563-3566,
(1998). Electronically available at quant-ph/9806043.
68. Tommaso Toffoli: Bicontinuous extensions of invertible combinatorial functions,
Mathematical Systems Theory 14, 13-23 (1981).
69. B. L. van der Waerden: Sources of quantum mechanics, North-Holland (1967).
70. C. P. Williams and S. H. Clearwater: Explorations in quantum computing,
Springer (1998).
71. C. P. Williams and S. H. Clearwater: Ultimate zero and one. Computing at the
quantum frontier, Springer (2000).
72. William K. Wootters, Wojciech H. Zurek: A single quantum cannot be cloned,
Nature 299, 802-803 (1982).
73. Andrew Chi-Chih Yao: Quantum circuit complexity, Proceedings of the 34th
Annual IEEE Symposium on Foundations of Computer Science - FOCS, 352361 (1993).
Index
BPP
BQP
EQP
18-20,31
31
31
NP 18,31
NQP 31
P 16,20
R 15
RE 15
RP 18-20,31
ZPP 18-20,31
accepting computation
adjoint matrix 8
adjoint operator 108
algorithm 15
alphabet 13
amplitude 7, 106,115
14
basis 163
basis state 8,115
basis states VI
Benioff, Paul 1
Bennett, Charles 32
Bernstein, Ethan 1, 33
Bezout's identity 150
bijective morphism 152
binary alphabet 34
binary quantum gate 24
black body 103
blackbox function 74-76
Bohr, Niels 104
Bolzmann's constant 32
Boolean circuit 15, 34
Born, Max 105
bra-vector 126
Cauchy-Schwartz inequality 165
causality principle 116,135
character 154
character group 154
Chinese Remainder Theorem 150
Church-Turing thesis 15
circuit 13
commutator 121
complete 107,166
completely positive 135, 136
compound systems 115
computation 14
computational basis 22
computational step 14
concatenation 13
concave function 177
conditional entropy 180
configuration 13,21
configuration space 17
congruence 147
constructive interference 106
continued fraction 169
controlled not 25
convergent 170
convex combination 127
coset 146
coset product 149
cyclic group 154
de Broglie, Luis 105
decidability 15
decision problem 15
decomposable 24, 115
decomposable state 11
degenerate 110
density matrix 126
density operator 127
destructive interference 106
deterministic Turing machine 13
Deutsch, David 1,2,33
dimension 163
Dirac, Paul 105
direct product 153, 155
direct sum 155,166
discrete Fourier transform 158
discrete logarithm 69
distance 166
distance function 166
188
Index
dual group
dual space
154
126
eigenspace 110
eigenvector 108
Einstein, Albert 104
elementary row operations 70
entangled 24, 115
entangled state 11
entropic uncertainty relation 123
entropy 175
epimorphism 152
EPR pair 25
equivalent states 115
Euclid's algorithm 167
Euler's cp-function 150
Euler's constant 59
Euler's theorem 151
expected value 120
factor group 148
fast Fourier transform 49
Fermat's little theorem 151
Feynman, Richard 1,27,33
Fibonacci numbers 171
Fourier transform 41,42,49,154,
158-161
Fourier transform decomposition 43
Fn§Chet, Maurice 126
functional 126
Gauss-Jordan elimination 71
general linear group 146
generator matrix 64
Gram-Schmidt method 181
group axioms 145
group morphisms 152
group theory 145
Grover's search algorithm 73,87
Hadamard matrix 44
Hadamard transform 44,159
Hadamard-Walsh matrix 23, 113
Hadamard-Walsh transform 44, 159
halting computation 14
Hamilton operator 113,118
Hamming weight 95
Heisenberg picture 135
Heisenberg's uncertainty principle
122
Heisenberg, Werner 105
Hertz 104
Hilbert space 7,11,22,24,27,166
image 152
index 147
Information 175
information VI
injective morphism 152
inner product 156,164
inner product space 164
internal control 13
inverse element 145
inverse Fourier transform 160
inversion 145
inversion about average 82
Jordan, Paul
105
kernel 152
ket-vector 126
Kronecker product
25
Lagrange's theorem 147
Landauer, Ralf 32
Las Vegas algorithm 19
Lecerf, Yves 32
letter 13
linear combination 162
linear dependence 162
linear mapping 166
literal 73
MacWilliams, F. J. 2
Markov chain 6
Markov matrix 6
Markov, Andrei 6
Maxwell, James Clerk 104
mixed state 4, 127
monomorphism 152
Monte Carlo algorithm 19
multitape Turing machine 20
natural basis 163
natural numbers 145
neutral element 145
No-cloning Theorem 27
nondeterministic Turing machine
norm 108, 156, 165
normal subgroup 147
normed space 165
observable 3, 118
observation VI, 22, 24
operator 108, 166
opposite element 145
order 149
orthogonal complement
108,165
18
Index
orthogonality 164
orthogonality relations
Ruohonen, Keijo
32
158
parallelogram rule 166
Parseval's identity 42,159
Pauli, Wolfgang 105
period 161
permutation matrix 38
phase shift matrix 114
phase space 3
photoelectric effect 104
photon 104
pivot 71
Planck's constant 103
Planck, Max 103
polarization equation 117
positive operator 109
positive operator-valued measure 120
Post, Emil 32,34
Preskill, John 2
principal character 154
probabilistic Turing machine 16
probability distribution 122
product character 154
projection 109, 153
projection-valued measure 119
pure state 4, 127
quantum bit 8,22
quantum computation VI
quantum computer VI
quantum Fourier transform 41,42
quantum information VI
quantum physics V
quantum register 24,26,116
quantum time evolution 116
Quantum Turing machines 30
qubit 22,116
query operator, modified 80
quotient group 148
random variable 179
range 152
Rayleigh, John 103
recursive solvability 15
reduced row-echelon form 71
rejecting computation 14
relativity theory V
reversible circuits 36
reversible gate 36
Riesz, Frigyes 126
Riesz, Marcel 123
rotation matrix 114
row-echelon form 71
scalar multiplication 161
SchrOdinger equation 118, 128
Schrodinger picture 135
search problems 73
self-adjoint operator 105, 109
Shannon entropy 122,175,177
Shannon, Claude 2
Shor, Peter 2,33
Simon's promise 63
span 162
spectral representation 110,113
spectrum 110
state VI, 3, 7, 22, 24, 115, 127
state space 7
state vector 3
state vectors 105
subgroup 146
subspace 163
successor 14
superposition VI, 8, 31, 115
surjective morphism 152
tensor product 10,25
time complexity 16
time evolution 135
Toffoli, Tommaso 38
trace 108
tracing over 131
tractability 16
transition function 13
transversal 65
trivial character 154
trivial subgroup 147
Turing machine V, 13, 14
uncertainty principle 121
undecidability 15
unit element 145
unitary matrix 8
unitary operator 109
universal Turing machine 33
variance 121
Vazirani, Umesh 1,33
vector space 161
vector space axioms 161
vector states 127
von Neumann-Liiders measurement
29
Walsh transform
44,159
189
190
Index
wave function 8
Wien, Wilhelm 103
Wootters, W. K. 27
word
13
Yao, Andrew
Young 104
39
zero-knowledge protocol
Zurek, W. H. 27
73
Natural Computing Series
W.M. Spears: Evolutionary Algorithms. The Role of Mutation
and Recombination. XIV, 222 pages, 55 figs., 23 tables. 2000
H.-G. Beyer: The Theory of Evolution Strategies. XIX, 380 pages,
52 figs., 9 tables. 2001
L. Kallel, B. Naudts, A. Rogers (Eds.): Theoretical Aspects
of Evolutionary Computing. X, 497 pages. 2001
M. Hirvensalo: Quantum Computing. XI, 190 pages. 2001
M. Amos: Theoretical and Experimental DNA Computation.
Approx. 200 pages. 2001
L.P. Landweber, E. Winfree (Eds.): Evolution as Computation.
DIMACS Workshop, Princeton, January 1999. Approx. 300 pages.
2001
Download