Evaluation of Boolean Formulas with ... Inputs Bohua Zhan

Evaluation of Boolean Formulas with Restricted
Inputs
MASSACHUSETTS
INSTITUTE
OF TECHNOLOCGY
by
AUG 13 2010
Bohua Zhan
LIBRARIES
Submitted to the Department of Physics
in partial fulfillment of the requirements for the degree of
Bachelor of Science in Physics
ARCHIVES
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2010
@ Massachusetts Institute of Technology 2010. All rights reserved.
A uthor .............
' ......
/9
(2'
Department of Physics
May 7, 2010
Certified by...'N.
Edward Farhi
Professor
Thesis Supervisor
Accepted by .............................
i..h..a...........
David Pritchard
Physics Thesis Coordinator
...
2
Evaluation of Boolean Formulas with Restricted Inputs
by
Bohua Zhan
Submitted to the Department of Physics
on May 7, 2010, in partial fulfillment of the
requirements for the degree of
Bachelor of Science in Physics
Abstract
In this thesis, I will investigate the running time of quantum algorithms for evaluating
boolean functions when the input is promised to satisfy certain conditions. The two
quantum algorithms considered in this paper are the quantum walk algorithm for
NAND trees given by Farhi and Gutmann [2], and an algorithm for more general
boolean formulas based on span programs, given by Reichardt and Spalek [6]. I
will show that these algorithms can run much faster on a certain set of inputs, and
that there is a super-polynomial separation between the quantum algorithm and the
classical lower bound on this problem. I will apply this analysis to quantum walks
on decision trees, as described in [3], giving a class of decision trees that can be
penetrated quickly by quantum walk but may not be efficiently searchable by classical
algorithms.
Thesis Supervisor: Edward Farhi
Title: Professor
4-
-----
Acknowledgments
I would like to thank my advisor, Prof. Edward Farhi, for his guidance throughout
this research project. This work is done in collaboration with Edward Farhi, Jeffrey
Goldstone, Sam Gutmann, Avinatan Hassidim, and Shelby Kimmel. I would not
have come this far without their numerous inputs in discussions over the past year.
I would also like to thank Andrew Lutomirski and Huy Ngoc Nguyen, with whom I
discussed at various points during this research and received many helpful comments.
6i
Contents
13
1 Introduction
. . . . . . .. .
1.1
Evaluation of NAND Trees . . . . . . . . . . . .
1.2
Span Program s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
23
2 Analysis of Quantum Algorithm
3
15
2.1
General Case
. . . . . . . . . . . . . . . . . . . . . .
23
2.2
Two Specific Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
2.3
Another Viewpoint.
.. . . . . . . . . . . . . . . . . .
28
.........
. . .
35
Classical Lower Bound
3.1
Statement of the Theorem
35
3.2
Preparation . . . . . . . . . .
37
3.3
Case k = 1 .
3.4
General k . . . . . . . . . . .
40
3.4.1
Exceptions . . . . . . .
41
3.4.2
Bounds on S
. . . . .
42
3.4.3
Bounds on D . . . . .
45
3.4.4
Bounds on Pk, . . . . .
47
An 'Easy' Boolean Function .
51
3.5
39
. . . ....
4 Quantum Walk Through Decision Trees
4.1
Summary of Previous Work
. . . . . . .
. . . . . . . . . . . . . .
4.2
M ain Result . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
7
5
Conclusion
List of Figures
1-1
Graph corresponding to a NAND tree of height 3. . . . . . . . . . . .
16
1-2
Schematic diagram of a gadget. . . . . . . . . . . . . . . . . . . . . .
18
1-3
A gadget for the 3-MAJ gate. . . . . . . . . . . . . . . . . . . . . . .
20
2-1
Assignment of weights for a sample tree. Empty circles denote nodes
with value 1, and solid circles denote nodes with value 0. . . . . . . .
2-2
32
Example of a tree that does not satisfy k-faults rule for small k, but
still has JAI2 polynomial in n. Empty circles denote nodes with value 1,
and solid circles denote nodes with value 0. Ellipsis denotes attaching
a subtree without any faults to make the leaves of the resulting tree at
the sam e height. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
A sample distribution for the NAND gate
. . . . . . . . . . . . . . .
36
3-2 A sample distribution for the 3-MAJ gate
. . . . . . . . . . . . . . .
36
3-1
4-1
An example of a decision tree with two lines attached to the root and
the unique node at level 4. . . . . . . . . . . . . . . . . . . . . . . . .
4-2
56
Visualization of entries in M. Each Mj corresponds to a weighted sum
of paths going from i on the left to
j
on the right. . . . . . . . . . . .
57
10
List of Tables
12
Chapter 1
Introduction
Evaluation of boolean functions is a major class of problems considered in the theory
of computation. In particular, we may consider those boolean functions F that can
be written nicely as a tree. The leaves of the tree correspond to inputs. That is,
we consider an input to F as a function associating each leaf a to some ra E [0, 1].
Then we associate each node b in the tree to some rb E [0, 1], by requiring that
rb = fb(rbi,
. .
,br)
for every non-leaf node b in the tree with child nodes bi,.
.. , be,
where fb : [0, 1]' -> [0, 1] is a fixed boolean function. The output of F is the value
associated to the root. We may further simplify the situation by requiring each fb to
f, and that all
by fixing f and a height
be the same function
leaves are at the same height. The function F
is then fixed
n. Throughout the paper we will let the height
n be the number of edges in any path from the root to the leaves. Equivalently this
is the number of times the function
f
is composed.
As examples, we will mainly consider two such boolean functions: NAND trees
and 3-MAJ trees, but many of our results are more general. The function
NAND tree is the NAND gate, with f(ri, r 2 ) = 0 if r1
=
r2
=
f
for the
1, and 1 otherwise. We
may interpret a NAND tree with height n as a two-player game with exactly n turns,
with the turn alternating between the two players. At each turn the player moving
has two choices. An input to the function corresponds to an assignment of each end
position of the game to either win or loss for the player moving. Then the value at
each node in the tree corresponds to whether the player moving will win given best
play, and the output of the overall function F is whether the player moving first will
win with best play. For this reason NAND trees are also called game trees.
The function
f
for the 3-MAJ tree returns the value of a majority of its three
inputs. That is, f(ri, r 2 , r 3 )= 0 if and only if at least two of ri, r 2 , r 3 are zero.
There are several widely-used classical models of computation for constructing
algorithms to evaluate these trees, and examining the cost of these algorithms. We
give three examples. In all three examples the cost of evaluating a leaf (or equivalently
querying an input value) is 1, and all other computations are free.
Model 1: the computation is deterministic. No mistakes are allowed. The cost is
the number of evaluations maximized over all inputs.
Model 2: the computation is probabilistic. No mistakes are allowed. The cost is
the expected number of evaluations on a fixed input, maximized over all inputs.
Model 3: the computation is probabilistic, and the algorithm is allowed to make
mistakes with bounded probability, say less than 1/3. This means the algorithm
makes mistakes with probability less than 1/3 on any given input. The cost is the
number of evaluations, maximized over all inputs and choices of the algorithm.
It is clear that for Model 3, taking the maximum number of evaluations is not
very different from taking the maximum over all inputs of the expected number of
evaluations, since the number of evaluations can exceed the expected number by a
large factor only infrequently, so the algorithm can afford to guess randomly if this
happens and only add a small value to its probability of error. When comparing to
quantum algorithms that can make errors with small probability, Model 3 appears to
be the most appropriate.
For the NAND tree, the obvious algorithm is shown to be the best classically, with
cost
O(N1i92[(1+v'33)/4])
= O(N
75 4
) [7], where N
=
2' is the number of inputs. How-
ever, there are quantum algorithms, first devised by Farhi, Goldstone, and Gutmann,
that can evaluate a NAND tree in time O(VN) [2]. The original algorithm, which
will be described below, is in the quantum walk model. It is written in the standard
query model by Ambainis in [1], still with O(
N) running time.
The classical lower bound for evaluating 3-MAJ trees is still unknown. In par-
ticular, the obvious algorithm is known to be suboptimal. Reichardt and Spalek [6],
using a concept called "span programs", first gave a quantum algorithm with running
time O(2"). They also showed in the same paper that this is the best possible. It is
faster than the currently best lower bound of Q((7/3)") for classical algorithms.
In this paper, we will consider the problem of evaluating boolean functions when
its inputs are restricted to a certain set. We will show that the quantum algorithm can
be much faster on this set of inputs, and that there is a super-polynomial separation
between the cost of classical and quantum algorithms for this problem. This investigation originated from attempts to find on which inputs can the quantum algorithm
for NAND trees run quickly, and which decision trees can be penetrated quickly by a
quantum walk (see [3] and below). However, it can be viewed with more generality
in the context of span programs, which we will do here.
In the next two sections, we will describe the algorithm given in [2] for NAND trees,
and the more general algorithm given in [6] using span programs. In chapter 2, we
will describe the restriction on inputs and analyze the cost of quantum algorithms for
the new problem. In chapter 3, we will analyze classical algorithms on this problem,
deriving a lower bound on the cost of any algorithm in Model 3 above. In chapter
4, we give an application of our results to quantum walk through decision trees. We
conclude in chapter 5 with possible directions of future research.
1.1
Evaluation of NAND Trees
In this section we will describe the algorithm given in [2] for evaluating NAND trees.
The algorithm is given in the quantum walk model. In this model, each case of the
problem is associated with a graph. This is then associated with a quantum system,
where each node in the graph corresponds to a vector in the standard basis, and the
Hamiltonian H is directly read from the graph. In this section we will let H be the
adjacency matrix of the graph. Afterwards we will consider weighted graphs (that is,
every edge (i,
j) is given a weight vij), and we let Hij = vij if there is an edge (i,
j),
and Hij = 0 otherwise.
After initializing the system to a certain state, the system evolves according to
the Hamiltonian for a certain time t, then the answer to the problem is found by a
measurement on the system. The equations of motion bear some similarity to that
of the classical random walk through a graph, hence the name quantum walk.
The algorithm in the quantum walk model is as follows. Consider the NAND tree
as a graph. For each leaf with value 1, add another node to the graph and connect it
only to that leaf. Next, make a 'runway' consisting of a line of 2M+1 nodes numbered
-M to M, where M is a large integer to be determined. The final graph is formed
by joining the root of the binary tree (denoted by |root)) to the node numbered 0
(denoted by
|r =
0)) on the runway. See Figure 1-1 for an example.
Inputs
NAND Tree
Runway
Figure 1-1: Graph corresponding to a NAND tree of height 3.
The initial state is a wave packet on the left branch of the runway, moving to the
right and having a narrow peak in the energy spectrum at E = 0. As the system
evolves, part of the packet will transmit through jr = 0), and the other part will
be reflected back. The following equation relating the transmission coefficient to a
parameter of the NAND tree is shown in [2].
T(E)
=
2i sin 0
2ii6iyE'(1.1)
2i sin 0 + y(E)'
where
(root|E)
E=-2cosO, y(E)= (roIE)
(r = 0|E)
and IE) is the exact eigenstate of the system with energy E. Here we assume that the
runway is essentially infinite. Given any (r = OE), one can see by counting equations
and unknowns that for generic values of E, the equation HIE) = E|E) fixes (bfE) for
every node b in the tree (that is, above
|r
= 0)). Moreover the value (root|E) fixed
this way is proportional to (r = OfE). So the quantity y(E) can be defined even when
the runway is finite.
The function y(E) can be computed as follows. First, for every node b in the tree,
let b' be the parent of b (the parent of
|root)
is
|r =
0)). Define
(bfE)
,VI.
yP (E) =
Then if bi, b2 are child nodes of b, the equation H IE) = E|E) turns into the recursive
relation
1
yb)=-EE- yb (E) - yb2 (E)
Yb(E)
and yb(E) = 1/E if b is a leaf.
Using this recursive relation, one can compute y(E) = yroot(E). By induction
from the leaves to the root, one can show that yb(E) -+ 0 as E -- 0 if b has value 1,
and yb(E) -- ±oo as E -+ 0 otherwise. In particular the behavior of y(E) as E -+ 0
indicates the value of the root. Using equation 1.1, one can see that T(E) ~ 0 at
E
0 if the root has value 0, and T(E) ~~1 otherwise. Thus we can obtain the value
at the root by measuring how much of the initial wave packet transmitted through
fr = 0). In order for this to work, we need
fy(E)f to be far from 1 for the range of
E in the packet (either near zero or very large depending on the value of the root).
From the range of E around 0 for which this is true, we can find how narrow the
wave packet needs to be in energy space, which in turn gives the size of the packet in
position space, which gives the length of the runway and evolution time required. So
the cost of the algorithm is inverse the size of range of E for which y(E) is far from
1. It is shown in [2] (by a method similar to the one described in the next section)
that the size of this range is at least 1
, where N = 2n is the number of leaves.
The cost is therefore O( N), which is shown in the paper to be the best possible.
1.2
Span Programs
The use of span programs in quantum algorithms is introduced by Reichardt and
Spalek in [6].
We will discuss in detail only the algorithm and the analysis of its
running time, for the special case where each input is used only once. This is sufficient
for the rest of the paper since it includes the case of NAND and 3-MAJ trees. We
will motivate the construction from the point of view of quantum walks.
To generalize the algorithm in the last section to evaluate other boolean formulas,
in particular those that can be written as a tree with boolean function
f
at each
node, one can try to construct subgraphs, or 'gadgets', with the following properties.
Let c be the number of inputs to
f.
The gadget should have c + 1 outgoing edges,
each connected to a different vertex inside the gadget. This is shown schematically
in Figure 1-2.
b1
bo
Figure 1-2: Schematic diagram of a gadget.
The edge shown at the bottom of the figure corresponds to the output xO, while
the c edges at the top are inputs x 1 ,..
.,
xc. To each input and output, we can define
yi(E) = (b E)
(ai|E)(
O<i<c
(1.2)
similar to before. The function y2 indicates the value xi in the sense that we require
as E -- 0,
yi(E) -+ 0
if xi = 1,
yi(E) -+ ±oo
if x = 0.
(1.3)
Once y(E), 1 < i < c and (aoIE) are fixed, by counting equations and unknowns one
can see that for generic E the vector E) is uniquely determined inside the gadget
by the equation HIE) = E|E). Moreover the value of (bolE) thus determined is
proportional to (ao IE). So yo(E) is a function of yi(E), 1 <i < c. For the gadget to
correspond to a boolean function
f
: [0, 1]' -+ [0, 1], we require that given yi(E), 1 <
i < c satisfying condition 1.3, then yo(E) -+ 0 also does with xo = f(xi,
...
, xc).
We can join these gadgets together to evaluate a function F as follows: let each
non-leaf node b in the tree correspond to a gadget Pb, and if node b is the i'th child
node of b' in the original tree, then identify the edge (ao, bo) in P with the edge
(ai, bi) in P'. The leaves of the original tree just correspond to the edges (ai, bi) in the
last level of gadgets. An input to F amounts to removing some of the edges (ai, bi)
corresponding to leaves. For consistency, the edge (ai, bi) is kept if and only if the
input xi is 0. Then yi(E) = 1/E if xi = 0 and yi(E)
=
0 otherwise, so condition 1.3
is satisfied. By induction y(E) at the root indicates the output of F.
In order to have some bound on the running time of the algorithm, we can set
further conditions on yo(E) as a function of yi(E), 1 < i < c. Define si such that:
si (E) =
yi(F)
1
ifxz=1 and si(E)=
if Xi = 0.
(1.4)
Roughly speaking, the smaller the values of si, the slower yi(E) approaches order
unity from either 0 or ±oo. We would like a condition such as Iso(E)| is at most a
constant multiple of s = maxiic Isi(E)I.
The paper [6] gives a construction of such a gadget based on span programs. A
span program P corresponding to a boolean function
f
: [0, 1]' -* [0, 1] consists of a
target vector t and input vectors vi, 1 < i < c such that f(xi, . . . , xc) = 1 if and only
if t is a linear combination of vi, where the coefficient of vi must be zero if xi = 0.
Here the vector space is over C, although different fields are also used. For example,
a span program for the 3-MAJ function is:
1
a
0
1
a
a
2
where a = -I and w = e2,i/ 3 . The above notation means vi, 1 < i
3 are the
columns of the matrix v.
It is clear that by a linear transformation on both t and vi, we can always make
t
=
(1, 0)x. The gadget is then constructed from vi as follows: the nodes in the gadget
include bo, ai, . . . , ac as shown in the figure, as well as auxillary nodes b',..., b' , where
d + 1 is the dimension of the vector space. The weight on each edge (ai, bi), 0 < i < c
is 1. The node bo is connected to each ai with weight (v)
1
, and the nodes b' are
connected to nodes as with weight (vr)j+1,2 . Figure 1-3 shows the gadget from the
span program for the 3-MAJ gate shown above.
b,
b2
b,3
b
b
aa
3
1O
Figure 1-3: A gadget for the 3-MAJ gate.
First, we have the following.
Theorem 1.2.1 ([6], Thm. 2.5). Consider the gadget constructed as above from a
span program P corresponding to a boolean function
f.
Define yi(E), 0 < i < c as in
equation 1.2. Then yi(E), 1 < i < c satisfying condition 1.3 implies yo(E) satisfy the
same condition, with xO = f(x1,...
, xc).
To give a bound on
Isol in terms of s = maxli<
|sil,
we need to use the concept of
witness size. A witness can be considered as a certificate for whether the span program
has a solution. Let V be the span of all Ivi) such that xi = 1. If f(xi, . . . , xc), then
t E V, so there exists a vector 1w) such that wi
=
0 for all xi
Otherwise, t ( V implies the projection of
It) onto
So there is a vector w' such that (tlw')
1 and (vgiw')
either 1w) or
|w') a
=
=
0 and vfw)
IIt).
the complement of V is nonzero.
=
0 for all xi = 1. We call
witness for P and xi.
Define s2(E) as in equation 1.4. Let si = supEE(oEo) isi(E)| for some chosen E 0 .
Let i be the diagonal matrix with si on the diagonal. Then the witness size is defined
as the following:
Definition 1.2.2 ([6], Def. 3.6). Let P be a span program correspondingto function f.
f (xi,. .. , xe) = 1, then wsize(P,x) equals ||2 1/ 2IW) 2 , minimized over
all witness |w) for x 0 = 1. Otherwise, wsize(P,x) equals I|1/2 vTIw')|| 2 , minimized
If x0 = f (x)
=
over all witness
|w')
for x 0 = 0.
The significance of the witness size is shown in the following theorem.
Theorem 1.2.3 ([6], Def 3.3, Thm. 3.7). Let P be a span program corresponding to
f, and define y(E), s2 (E) and si as above. Then there exist constants c1 and c2 such
that
IsO I
ci + wsize(P, x)(1 + c2 E 0 max
1<i<c si)
(1.5)
for each input x and maximum energy E0 > 0 such that E0 maxl<i<e si < c for some
small e determined in the proof.
Currently the witness size of a gate depends on the values of si for inputs. However,
we can extract a less informative, but conceptually simpler quantity that measures
the intrinsic difficulty of a gate on a given input. It is clear from the definition
that witness size is an increasing function on each si. Let W(P, x) = min(wlw) or
min(w'Iw') depending on whether f(x) = 1 or 0 (minimum taken over all witnesses).
Then the following immediately comes Definition 1.2.2:
wsize(P, x) < (max si)W(P,x).
1<i<c
21
(1.6)
Combining equation 1.6 and 1.5, we have the following equation which we will use in
the next chapter. Let s' = max1<i<c si, then
(1.7)
1sl < c1 + s'W(P,X)(1 + c2Eos').
The constants W(P, x) can be explicitly calculated.
constructed above for the majority gate, W(P,X)
=
1 if X1
For the span program P
= X2 =
x 3 and W(P, x) = 2
otherwise. In particular the span program above is optimized so that max2 W(P,x) =
2 is the smallest possible. For the NAND gate, one can construct a span program for
the AND gate, then append single edges to achieve the effect of a NOT gate. The
result is the NAND tree given in the previous section with two single edges inserted
between each level. The effect of these two edges cancel out, giving the original NAND
tree (in particular single edges do not increase y(E) by much). For one possible span
program for the AND gate which we will use later, we have W(P, x) = 2 if xi / x 2 and
1 otherwise. It is also possible to construct an AND gate with maxx W(P, x) = V2,
which gives the optimal running time for arbitrary inputs.
We will use Equation 1.7 and these values of W(P, x) to give upper bounds on
the running time of the quantum algorithm in the next section. However, we will also
give explicit calculations to provide more intuition, as well as verifying the values of
W(P,x) given here.
From the results summarized above, Reichardt and Spalek gave an O( 2 ") algorithm for the 3-MAJ tree, as well as similar algorithms for other gates. Moreover,
they showed that for a large class of trees, including trees of uniform height made
of a single boolean function on at most three variables, the algorithm given by the
span program is optimal. In later papers, Reichardt extended this optimality result
to essentially all boolean formulas, see [5] and [4] for details.
Chapter 2
Analysis of Quantum Algorithm
In the previous chapter, we described how to construct quantum algorithms to evaluate a boolean function F that can be written as a tree with function
given a span program corresponding to
f.
f
at each node,
We noted that the running time required
by the algorithm depends on the behavior of the function y(E), in particular the
functions s(E) defined in 1.4.
In this chapter we will describe in detail the calculation of s(E). In [6], the focus
is on the maximum of s(E) over all inputs. Here we will focus on how it varies with
the actual input, and for which inputs will it be small.
In section 2.1, we will use equation 1.7 from last chapter to obtain a general result
applying to any span programs. In section 2.2, we give concrete calculations for the
NAND tree and 3-MAJ tree, verifying the values of W(P, x) as well as equation 1.7 in
these cases. In section 2.3, we will give another way to see how the results for NAND
trees can be obtained.
2.1
General Case
The general result is the following.
Theorem 2.1.1 (cf. [6] Thm. 4.16). Let P be a span program corresponding to
function
f.
Let F be a boolean function that can be written as a tree with
f
at each
node. Let n be the height of the tree. Construct the quantum algorithm as before and
let s be the value of so at the root. Suppose W(P, x) > 1 for all x. Then for a given
input x to F, we have s = O(nWT) with Eo = O(n-2 W
WT =
max
U
1)
where
(2.1)
W(P, X(b)))
paths X bcx
where the maximum is taken over all paths from the root to the leaves, and x(b) is the
input at the node b.
Proof. If we disregard the ci and c2 terms in equation 1.7, then at every node of the
tree representing F, the value of so is the maximum over all paths of the product of
si at the leaf and the difficulties W(P, X(b)) at the gates along the path. The value of
si at the leaves is at most 1. This gives s < WT if ci
=
c2 = 0.
Since we are giving an upper bound we may take c1 , c 2 > 0.
Then s is the
maximum over all paths from root to the leaves of a sequence of additions by c1 > 0
and multiplications by W(P, x)(1+c 2 Eosb) > 1. For any such sequence, we can always
put all additions before all multiplications, without lowering the resulting value. The
addition part gives 1 + nc1 (where 1 comes from the initial si at the leaves). The
multiplication part for path x is
fJ W(P, X(b))(1 + c2Eosb)
fJ W(P, X(b))(1 + c2 Eos).
bEX
bEX
By taking Eo = 1/(c2 ns), we can make the product to be at most e ] W(P, X(b)).
Taking the maximum over all paths, we find
s < (1 + nc1) - (eWT) = O(nWT).
From this we obtain Eo = O(n-2 WFl).
E
If for some span program P, there are inputs x = (Xi,.. ., xc) and y = (yi,..., yc)
such that f(x)
W(P,X(b))
=
#
f(y) and W(P,x)
=
W(P, y)
=
1, then there are inputs where
1 for most of the nodes b in the tree. In particular, we can consider
trees such that along every path from root to the leaves, the number of nodes b on
the path where W(P, X(b))
#
1 is at most k, where k < n is fixed. This motivates the
following definition.
Definition 2.1.2. Let P be a span program corresponding to a function f . We say
an input x = (x 1 , . . , x)
is trivial if W(P, x) = 1, and non-trivial if otherwise. For a
given k, we say an input x to the overall function F satisfies the k-faults rule if along
any path from the root to the leaves in the tree corresponding to F, there are at most
k nodes b where the input X(b) at b is non-trivial. Often we will just say that the tree
correspondingto F satisfies the k-faults rule.
The following is immediate:
Corollary 2.1.3. Consider P,
f, F
rule, then s = O(n -W(P)k) with Eo
and s as before. If input x satisfies the k-faults
=
O(n- 2 W(P)-k). So the running time of the
quantum algorithm can be set as O(n 2 W(P)k).
Proof. The statements for s and EO follow immediately from Theorem 2.1.1 and
Definition 2.1.2. As observed in the last chapter, the shortest running time for which
the quantum algorithm makes errors with small probability is inverse proportional
to the size of the range of E where y(E) is far from 1. For
|El
< Eo, we have
y(E) < sE = O(n- 1 ) if the output is 1, and y(E) > 1/(sE) = O(n) otherwise. So
y(E) is far from 1 within a range of size Eo.
l
If we take k = a log n for some constant a, then the running time becomes
n 2 W(P)alogn
-
n2+alog W(P)
which is polynomial in n. In the next chapter, we
will give, for a class of boolean functions
f
that includes NAND and 3-MAJ, a lower
bound of Q((log n/di)k/d2) -- na(loglogn-log di)/d2 evaluations for classical algorithms,
where di, d2 are constants. This gives the superpolynomial separation between classical and quantum algorithms on evaluating F for inputs satisfying the k-faults rule.
Note the conclusion in corollary 2.1.3 does not depend on the condition that the
leaves of the original NAND tree are all at the same height. The corollary applies to
the evaluation of value at root of any tree where the leaves have value 0 and internal
nodes are assigned value by the extended NAND rule - the usual NAND rules together
with negating the value at single edges. Since single edges do not increase y(E) by
much, the parent of that edge is not considered a fault. We will make use of this
observation in chapter 4.
2.2
Two Specific Cases
In this section, we will give detailed computations for the NAND gate and the 3-MAJ
gate, verifying the result in the previous section. Since the result is proven in the
previous section, we will not be entirely rigorous in this section.
For the NAND gate, we will simply analyze the algorithm described in section 1.1.
Recall that we associated the function Yb(E) to each node b (instead of each edge as
for span programs). This is defined by Yb(E)
=
(blE)/(b'lE) where b' is the parent of
b, and the recursion relation is:
Yb(E)
1
E - Yb (E) - Yb2 (E)
where bi, b2 are child nodes of b, and Yb(E)
=
1/E if b is a leaf. It is clear that for
the recursion relation, y(E) increases with both Yb, (E) and Yb2 (E) on each side of the
pole Yb (E) + Yb2 (E) = E.
For energies |El < E0 , where E0 is to be determined later, we will verify that
0
yb(E)
-sbE if the value at node b is 1, and that Yb(E) > 1/(2sbE) otherwise
(we will see soon the purpose of the factor of 2). Initially this is true with sb < 1.
Consider a node b with child nodes b1 and b2 . Let s = sb and s' = max (Sb1 , sb2 ).
Then Yb, satisfies the conditions for s' if it does for sb,. So we may replace each sb
by s'. There are three cases for the recursion:
Case 1: v(bi) = v(b 2 ) = 0. From Yb (E) > 1/(2s'E), i = 1, 2 we have
Yb(E)
1
E
2s'E
1
2s'E
-
1
- IE2 = -s'E(1 + O(s'E 2 )).
sE
We assumed that Yb. (E) >> E which should be valid for small E. So we may take
s = s'(1+0(s'E2 )). This verifies the equation analogous to 1.7 with W(P, (0, 0)) = 1.
Case 2: v(bi) = v(b 2)
=
1. From 0 > Yb (E) > -s'E we have
Yb(E)>
So we may take s = s'+
11
E + s'E + s'E
1/2, verifying W(P, (1, 1))
Case 3: v(bi) = 0, v(b 2 )
=
1.
1.So
1
yb(E) 2 E + s'E -
2(s'+ !)E
1
2s'E
1 - 2s'E 2 - 2s'2 E 2
-2s'E(1+
O(s'E)).
2s'E
So we may take s = 2s'(1 + O(s'E)), verifying W(P, (0, 1)) = W(P, (1, 0)) = 2. Note
that we do need a condition on
Sb 1 ,
since if it is too large, the term Yb, (E) may
dominate Yb2 (E) for the size of E we are considering here.
Now we consider the 3-MAJ gate. For a given node b in the tree with child nodes
bi, i = 1, 2, 3, we may again consider Yb(E) as a function of yb(E) for i
=
1, 2, 3. Now
it is more difficult to show that this function is increasing in each Yb, (E). We will
assume that this is the case. So if s'
= max(sb,,
8
b2 , Sb 3 ),
then we may replace each
yb,(E) by either -s'E or 1/(s'E).
The gadget for the 3-MAJ gate is shown in Figure 1-3. By abuse of notation,
we will let ao also represent (aolE), and similarly for other nodes. Reading from the
figure, we see that the equation HIE) = EE) becomes:
Ebo=
Eb'=
Ea 1 =
ao+a(a±+a2 + a 3 )
a 1 +wa
2
+w 3 a 3
obo+b'o+b
1
Ea 2 =
abo+wb'o+b 2
Ea 3 =
abo+w 2 bO+b
3
If we set ao = -1, and substituting in bi = ajyj(E), 1 < i < 3,we get the following
system of equations:
1
-E
0
a
a
a
bo
0
0
-E
1
W
W2
bo
a
1
-E+yb
0
0
ai
0
a
w
0
-E +yb 2 (E)
0
a2
o)
a
W2
0
0
-E+yb,(E)
a3
0
=
1
(E)
It is clear how to generalize this to other span programs. For generic E this has a
unique solution. Then Yb(E) = bo/ao = -bo.
For the case xi
X2
= X3 =
1, we may
substitute in Yb.(E) = -s'E. The result is
-bo =
Yb(E)
s'E±
+_
1sE
2
-(s' + 1)E(1 + O(s'E 2 )).
-
(s'+1)(1+ 0(s'E2 )), verifying equation 1.7 with W(P, (1, 1, 1))
So we may take s
=
1. Similarly we can compute:
X1 =
£1 =
yb(E)
X2 = X3 = 0:
0, X2 =
Xi = X2 =0,
= =
X3
=
1: Yb(E)
1
:
=
=
2
1±O(s'E )
2
3
-s'E
3
(s'±1)E-S'E
-(S'+1)E
6(s'+1)E±O(s'E) = -2(s'
b(E)
_
3+O(s'E)
6s'E+O(E)
+ 1)E(1 + O(s'E)).
1+O(s'E)
-2s'E
which verifies W(P, (0, 0, 0)) = 1 and W(P(O, 1, 1)) = W(P(0, 0, 1))
2.3
=
2.
Another Viewpoint
In this section we will focus on NAND trees as described in section 1.1. We will
give the expansion of y(E) around E = 0, and another way to view the result in the
previous two sections. While this approach does not yet yield any rigorous proofs,
it may give additional intuition about the situation, especially when considering for
what additional trees may the value of y(E) be small. Indeed this is the method by
which the constraint on the input is first arrived at.
Recall the construction of the quantum system given in section 1.1. Let H be
the Hamiltonian of the entire system S, and H' be the Hamiltonian of the system
S' derived from a graph consisting of only the tree (including leaves added for the
input). Let
IE)
be an eigenstate of H with energy E, and let E') be the vector
obtained by restricting IE) to S'. Then |E') satisfies the equation H'IE') = E|E')
everywhere except at the root. For the root, an extra term is needed to account for
the Ir = 0) neighbor of the root. These can be summarized as:
(r
=
O|E)Iroot) + H'IE') = E|E').
Now let |jx) be the energy eigenstates of H'. Expanding the equation above in
terms of these eigenstates, we have:
Z (r = OlE) - ailj)
where Ai is the eigenvalue of
|Xi),
+ Ajcjlyi)
EclXi)-
=
and aj, ci are coefficients of
|Xi).
tively when expanded in terms of the eigenstates
Since
|root)
|Xj)
and
|E') respec-
form a basis, the
equation holds for each i. Rearranging, we get
ci = (r = OE) EA
Substitute into IE')
=
EZ cilxi), we get
IE') = (r = 0|E)
aj
IXi).
Taking the coefficient at the root:
(root|E)
y(E)y(E)
=
=
OI=) (root|E')
(r = 0|E)
(r = 0|E)
where we used (rootIXi)
=
when expanded around E
-i
jail
ai
E - Aj,
A
E -AA(rootlxi)ZE
(2.2)
(2.2
a*. We are most concerned about the behavior of y(E)
=
0. Therefore we make such an expansion:
y(E)=
ai12
|ai 12 /A
E- A
1
lai2 (
A 1
/
case where the root has value 0, and y(E) -
|ail 2 =
value 1, then we must have
+
E
2
+..) )
0 for some i. If Iail 2 for one
First, we need to explain what happens when Ai
such i is nonzero, then.y(E) has a simple pole at E
E
=
0. This corresponds to the
±oo. On the other hand, if the root has
0, or ai = 0, for each i such that Ai = 0. We will
focus on the case where the root has value 1, so we can ignore the part of the sum
where Ai = 0.
Now we need to interpret the coefficient of E' in this sum. We will now write
H instead of H' for the Hamiltonian corresponding to the tree alone, since this will
become our primary concern. Let H- 1 denote the pseudoinverse of H. That is, the
operator satisfying H-1
lXi)
= A 1 Xi) if Ai
#
0, and H-1 |Xi) = 0 otherwise. Let H- h
denote H- 1 raised to n'th power. We can compute:
(rootIH~"Iroot) = Z(root|Xi)(Xi H "IXj) (X Iroot)
ij
|ail 2 -n
=
i,AkO0
Substitute in, we obtain:
y(E)
=
-
(root|H
-Iroot)E".
(2.3)
n>O
It is easily seen that H-" root) is the vector with the smallest norm among all
vectors JA) such that H"|A) = |root). In our case the situation can be simplified
further because our graph is bipartite. That is, its nodes can be partitioned into
two parts such that each edge connects two nodes from different parts. The vector
|root) contains coefficients only in one of the two parts. If JA) = H-Iroot) contains
coefficients in both parts, then we can set its coefficients in the part containing the
root to zero. This will not change HIA) since this will just set the coefficients of
that vector in the parts not containing the root to zero, which they must be in the
first place. Since JA) has the property that norm is minimized, we can conclude that
|A)
contains coefficients only in the part not containing the root. By induction, we
know JA) = H-"Iroot) contains coefficients only in the part containing the root if n
is even, and only in the other part if n is odd. This shows (rootIH
2
-1Iroot)
=
0.
Using the hermiticity of H- 1 , which follows from that of H, the above equation can
be simplified to give our main result:
y(E)
2
-H-Iroot)
E 2n-1.
(rootlH-2"|root)E2 n1 = -
= n>1
n>1
From the equation, we see that the rate of growth of y(E) around E
associated with the norm of the vectors H~-root).
=
0 can be
Another way to see this is to go
back to equation (2.3). The location of the pole of y(E) closest to E
=
0 gives a good
indication of when y(E) increases to unity. This is the smallest nonzero eigenvalue of
H, which becomes the largest eigenvalue of H-1 , which in general will measure how
fast the norms of H-n root) increase as n increases. The fact that the running time
depends on the size of the smallest positive eigenvalue is a reflection of the possible
interpretation of the quantum algorithm as a spectral analysis around E = 0, see [6]
for details.
One way to estimate y(E) around E
=
0 is to directly estimate the largest
eigenvalue of H-1 . In this section, however, we will just focus on estimating a =
IH- 1 Iroot)
2.
In many cases y(E) will increase to order unity around E = a- 1 , but
sometimes a can be small and IH-nIroot)
2
are large for some n > 1, making y(E)
increase faster than can be predicted from the size of a-.
By the minimal property of H-Iroot), we know a <
that HA
=
Iroot).
|A| 2
for any vector A such
Therefore, to give an upper bound on a, it suffices to find one
such vector A and compute its norm. Finding such a vector is equivalent to assigning
weights to each node of the tree, such that the sum of weights at neighbors of the root
is 1, and the sum at neighbors of any other node is 0. This can be done according to
the following rules:
. All nodes with value 1 have weight zero.
.
For each node b with value 1, whose parent is has weight z.
If b has two child nodes both with value 0, then each child node has weight
-z/2.
If there is a fault at b, then every node in the subtree of the child node with
value 1 has weight 0, and the child node with value 0 has weight -z.
* The value of z for the root is -1, and the assignment of weights proceeds
inductively starting from the root.
The nodes attached to the leaves with value 1 of the original NAND tree can be
considered a child node of the corresponding leaf, so the new tree is still labeled by the
extended NAND rule, with every leaf having value 0. See Figure 2-1 for an example.
0
1
0
0
-1/2
0
-1/2
-1
0
0
0
0
0
0
0
Figure 2-1: Assignment of weights for a sample tree. Empty circles denote nodes with
value 1, and solid circles denote nodes with value 0.
It is clear that only nodes at even levels have nonzero weights (when the tree is
considered as a bipartite graph, this is exactly the part not containing the root). If
there are no faults, the weights halve every two levels down the tree. This means
the contribution to |A|2 of each node is multiplied by 1/4. There are also four times
more nodes every two levels down the tree. So these changes cancel out and the
contribution of each level is the same. The sum |A|2 is then linear to the height of
the tree.
At every fault in the tree, the weight at the child node with value 0 is twice the
weight that node would have if there is no fault. A simple upper bound for |Al 2 for
trees satisfying the k-faults rule is then found as follows: for each node in the tree,
there are at most k faults in the path from the root to that node. At each fault the
ratio of the actual weights on the path to the weights that would be assigned if there
are no faults at all will either double or become zero. This means the contribution
to JAI 2 from each node cannot be greater than (2 k)2
= 4k
times what it is before.
Combined with the result in the previous paragraph, this shows |A12 is of order at
most n - 4 k (this can be reduced to n - 2 k with more careful arguments).
As we stressed before, the linear term is not the whole story. One way to see this
is that we have not used all the conditions in the k-faults rule. Indeed our argument
seems to work regardless of what happens at nodes in the side of a fault rooted at
the node with value 1, since all these nodes have weight zero. However, when we
go to the next power H-2|root), some of these nodes will be numbered, and locally
according to the rules given above. If the subtree descending from the Y-child node
contains many faults, the norm of H-2|root) will be large, therefore y(E) will still
grow quickly.
On the other hand, this view also indicates that there are trees that do not satisfy
k-faults rule for any small k, but still has slow-growing y(E). One example includes
trees that have faults at every node along some path (say the right-most path), and
no fault anywhere else. The most difficult case among these trees is when the values
of nodes alternate between 0 and 1 along the path, so the weights on the path never
becomes identically zero (see Figure 2-2).
Considering the assignment of weights
on this tree, it is clear that JA12 is quadratic in n, so the running time can still be
polynomial in n. This appears to be the case from numerical computations of y(E).
Figure 2-2: Example of a tree that does not satisfy k-faults rule for small k, but still
has |Al 2 polynomial in n. Empty circles denote nodes with value 1, and solid circles
denote nodes with value 0. Ellipsis denotes attaching a subtree without any faults to
make the leaves of the resulting tree at the same height.
Chapter 3
Classical Lower Bound
3.1
Statement of the Theorem
In this chapter we will give lower bounds on the cost of classical algorithms for
evaluating one class of boolean formulas with restricted inputs. Recall that we are
considering boolean formulas that can be represented as a balanced c-nary tree, with
each node acting as a gate, evaluating a function f(x1, . .. , xc) of its child nodes. The
set of all possible inputs to a gate is partitioned into two types: trivial and non-trivial.
We call a node in the tree a fault if it is not a leaf and the input given by its child
nodes is nontrivial. We restrict the input to the overall formula so that along each
path from the root to the leaves, there are at most k-faults for some given k (this will
be called the k-faults condition). Let n be the height of the tree, as the number of
edges on any path from the root to the leaves. We say the root is at level 1 of the
tree, so the leaves are at level n + 1.
The main goal of this chapter is the following theorem.
Theorem 3.1.1. Given a function f : [0, 1]c -
f
[0,1] and a partition of the inputs of
into two types, trivial and nontrivial. Suppose the following two conditions hold:
1. There exists X = (x1...
,xc)
E [0, 1]c such that both X and X = (,I...
ze)
are trivial, and f(X) } f(X).
2. There exists ko and no such that for each r E [0, 1], there is a distribution9, of
trees with height no, root evaluating to r, and satisfying the ko-faults rule, such that
the probability of any leaf evaluating to zero is exactly 1/2 in this distribution.
Then for large n and k polynomial in log n, no probabilistic algorithm can obtain
the value of the root of every tree with height kn and satisfying the (kko)-faults rule
with confidence at least 2/3 before evaluating 0k leaves, where 3 = [(log n)/5], and
n = n -
n
o
.
Both the NAND gate and the 3-MAJ gate satisfy the conditions of the theorem.
For the first condition, we can take xi = 0 in both cases. For the second condition, one
distribution for the NAND gate is shown in Figure 3-1, plus a suitable permutation
of inputs. One distribution for the 3-MAJ gate is shown in Figure 3-2, again plus a
0
1
1
1
0
1
1
0
1
0
1
1
0
0
Figure 3-1: A sample distribution for the NAND gate
suitable permutation of inputs. Here the case for r = 1 is symmetric.
0
0
0
0
1
0 0 1 0 0 1 0 1 1
0
00
0
1
1 0 01 11
1
Figure 3-2: A sample distribution for the 3-MAJ gate
The first condition allows the following construction: given a root value r E [0, 1]
and height n, there is a unique tree of height n, such that for any non-leaf node a
in the tree, the values on the child nodes of a are either X or X. Moreover the tree
with root 0 and the tree with root 1 differ at every node of the tree.
3.2
Preparation
We choose X, no, ko and 9, according to the conditions of the theorem. Fix n sufficiently large (in particular n
k > 1 a distribution
T
k
>
no). To prove the theorem, we construct for each
of trees with height kn and satisfying the (kko)-faults rule.
We show that no deterministic algorithm can obtain the root value of a randomly
chosen tree from
T
k
with probability at least 2/3, before evaluating
/k
leaves. It is
clear that this implies Theorem 3.1.1.
The distribution
T
k
is constructed by induction on k. For k = 1 the construction
is as follows: first the root value r E [0, 1] and an integer i E {1, 2, ... , i5} are chosen
uniformly. Then the first i levels of the tree are specified by requiring that for each
node a in the first i - 1 levels, the child nodes of a have values either X or X as
given in condition 1. By the discussion above this choice is unique. For each node b
at level i, we choose the next no levels of the tree rooted at b from the distribution
9v(b),
where v(b) denotes the value of the tree at node b. Doing this independently
for each node at level i, we obtain the next no levels of the tree. Finally, we complete
the tree by requiring that each non-leaf node at or below level i + no has child nodes
with value either X or X. This completes the construction for k
=
1.
For any k > 1, we construct the first n + 1 levels of the tree according to the
distribution T. Then for each node b at level n + 1, we choose the subtree rooted at
b independently from the distribution
that any tree in
T
we say a tree in
k
T
Tk_1.
This gives a distribution Tk. It is clear
will have height kn and satisfy the (kko)-faults rule. For any k,
k
has category i if i E {1, 2,.. . , i5} is chosen during the construction
of the first n + 1 levels of the tree.
Now we will follow a deterministic algorithm trying to determine the value at the
root of a random tree from these distributions. Note that at any point during the
computation, we know exactly which trees in the distribution are not excluded by the
known values of the leaves. Let po, pi be the probability (in the distribution) that
a random tree remaining has root value 0 and 1, respectively. Then the algorithm
should guess 1 if pi > 1/2 and 0 if pi < 1/2. Moreover the probability that the
algorithm will be correct is
max(po,pi)
Naturally we will try to bound
|po -
=
1 |Po- pil
2 I
I +
2
2
pi , which we call the confidence level.
Let p,(i), r E [0,1], 1 < i <h be the relative probability that a random tree
remaining has root value r and category i. Write S
we have not specified S
=
=
E
po(i) +
Eipi(i).
Note that
1 or any other normalization. With any normalization, we
have
|PSWrite D
=
K=I
p1=
lpo(i) -
IE po(i) -
EZpi(i)|
i= jpo(i) - p1(i)|
-S
pi(i)| (for difference). Our general approach is to give, up
to some small probability, an upper bound on D and a lower bound on S.
For the lower bound on S, we need the following lemma:
Lemma 3.2.1. Fix A0 > 0. For each integer t > 0 we choose a number 0 < pt < 1
(knowing the value of At_ 1 ). Then At = ptAt_1 with probability pt, and At
=
(1 -
pt)At_ 1 otherwise. For any m E Z+ and 0 < C < 1, the probability that Am < CAo
is less than 2mC, regardless of the choices of pt.
Proof. First let A0
=
1. We can model the process as trying to bound a real number
x uniformly distributed between 0 and 1, where at each step t we choose a number
yt and is told whether x < yt. Here At corresponds to the size of the interval that
is not excluded after t steps, and pt corresponds to the relative position of yt within
the non-excluded interval (pt = 0 if yt is to the left of the non-excluded interval, and
pt = 1 if yt is to the right).
After m steps, each of the 2 m possible strings of answers we receive corresponds
to some subinterval of [0, 1]. These subintervals cover [0, 1] without overlaps (if some
string of answers is impossible, let it correspond to the empty interval). The condition
Am < C corresponds to x falling in an interval of size less than C. Since there are
just 2m intervals, the combined length of intervals with size less than C is less than
2m C. So the probability that x will fall into one of these intervals is less than 2mC.
It is clear that for general Ao > 0, the same conclusion holds with Am < C
E
replaced by Am < CAO.
In this chapter we will use AO = 25, m = # = [(log h/5)], and C
=
5i-
2 5
/
. So
Am < 2h3/ 5 with probability less than p = 2mC < 5-1/5. This particular choice is
used to ensure that 5/A 2 = o(1) with high probability.
3.3
Case k = 1
Now we begin the proof for the k = 1 case. We fix the normalization of p,(i) so it
equals the probability that a random tree with root value r and category i from the
distribution Ti is not excxluded by the known values of leaves. So at the beginning
of the computation, we have po(i) = p 1 (i) = 1 for all 1 < i < h, and S = 25. Then
Lemma 3.2.1 can be applied as follows: suppose the algorithm chooses to evaluate
leaf b at step t, then pt corresponds to the probability that a random tree remaining
after step t - 1 has value 0 at b, and At corresponds to the value of S after t steps.
By the lemma we know that after m
=
#
steps, the probability that S < 253/5 is less
than 5-1/ 5 .
For the upper bound on D, we need to examine how the values po(i) and p1 (i)
are updated after evaluating each leaf. The main property of the construction is the
following: for any node b at level i, where i is the category of the tree, if a is the
first leaf descending from b that is evaluated, then the probability that a evaluates
to 0 is exactly 1/2, even conditioning on the value at b. For any r E [0, 1] and i, it
follows that p,(i) = 1/2 after the first step. Now suppose leaf a is evaluated at step
t > 1. Let h be the level of the first node where the path from root to a breaks off
the tree formed by the already evaluated leaves. More precisely, let E be the set of
leaves already evaluated, and represent both a and each by E E by coordinates in [c]"
coding the path taken to reach the leaf. Then h is the number of initial coordinates
that agree between a and by, maximized over b E E. Let E' c E be the subset of
already evaluated leaves that agree with a to height h. We will examine how each
Pr(i) is updated upon learning the value of a. There are three cases:
Case 1, i+no < h: Here we are conditioning on the faults occuring before the last
split. In particular the leaves a and by E E' are contained in a subtree formed entirely
of inputs X and X. Recall that in such a subtree the value of one leaf determines the
value of all others. If the values of E' are inconsistent with each other, then p,(i) = 0
for all i + no < h to begin with. Otherwise, the value of a can be predicted from the
values of bj E E'. If the actual value of a is the same, then p,(i) remains the same.
Otherwise p,(i) = 0 after evaluating a.
Case 2, i > h: Here a is the first leaf we evaluated that descends from some
node b at level i. The probability that a has value 1 is exactly 1/2, even if we know
something about the value of b. So p,(i) is reduced by a half for all i > h.
Case 3, i < h < i+no: We cannot say anything about these cases without knowing
the details of the function
f.
However, the increase of
|po(i) -pi
(i) due to these cases
is at most no.
Neither case 1 nor case 3 involves any increase in D. So the increase is by at most
no for each step. Since D
=
0 at the beginning, we know D < Ono after 3 steps.
Combining with the upper bound on S, we have
po -p
D <no#
23/5
with probability at least 1 -i -1/5.
3.4
General k
We will prove by induction statements of the following form: for a given 1 < 1 < k,
if less than
#1leaves
with probability 1 -
are evaluated on a random tree from the distribution TA, then
Pk,l,
the confidence level |po - pi is at most
proved this for
I
=
1, pi,1 = 5-1/5, and ci,1
=
253
Ck,1.
For k = 1, we
In general, we will show the following:
Pk,k < 5-1/5, pj,
and ck,l ;<
< 4 1 25-(k-l+1)/5
for 1 < k,
(3.1)
(3.2)
(8no) k-1+13(k-1+1)/503+k-1.
where a < b means a < b(1 + C#3nr), for fixed constants a > 0 and -y < 0, and C
grows polynomially in k and 1. The 1 = k part of the above statements easily imply
the theorem.
The proof naturally has three parts: bounds on S, D and
Pk,l
for each k, 1. But
first, we shall lay down a framework describing when is the algorithm 'lucky' during
the computation.
3.4.1
Exceptions
The concept of exception is defined inductively as follows. For k = 1 = 1, an exception
occured if the algorithm is lucky in the division process specified in Lemma 3.2.1 (so
the probability is bounded by
h-1/ 5 ,
from now on we will just say the division process
is lucky). For k > 1, let L be the set of nodes at height n + 1. We will imagine the
algorithm as trying to evaluate leaves descending from different b E L, then piece
together information about nodes in L to guess the value at the root. It doesn't
matter whether the algorithm is following this type of strategy, since any sequence of
evaluations is allowed in this view, and in principle we can calculate exactly which
root value the algorithm should return, along with the confidence level. For 1 < k, an
exception occured if exceptions occured for at least two of the subtrees (with roots in
L, same below). If this does not happen, then we will show that the algorithm has
high confidence in the value of at most one b C L. Since the first sure value of a node
in L simply sets each pr(i) to 1/2, the division process does not properly begin. If
1 = k > 1, an exception occured if exceptions occured for at least two of the subtrees,
or if the division process is lucky. In this case, we may evaluate 0#k1 or more nodes
on a subtree (which is in the distribution Tk-1). For these subtrees we have no bound
on the confidence level. The number of such subtrees is at most
#-
1. If at most one
exception occured on subtrees, then the algorithm has high confidence on at most
0 - 1+1
=
nodes in L, which is just the maximum number of steps in the division
process.
In the next two subsections, we will give bounds on S and D given that exceptions
did not occur. By induction, we may use the value of
Ckl,i
for any 1 < i < k - 1
when considering the case (k, 1). Also, to simplify the argument, we give the algorithm
the correct value of a node b E L after
#k-1
leaves descending from that node are
evaluated, or after an exception occured at b. Equivalently we give the algorithm the
values of all leaves descending from b. Clearly the algorithm cannot do worse with
this help. In the last subsection, we will bound the probability of an exception, which
becomes
3.4.2
Pk,i.
Bounds on S
First, we will describe how Pr(i) is calculated in the case k > 1. For each b E L and
r E [0, 1], let p(b, r) be the probability that a randomly chosen tree with root value r
in T_ 1 is not excluded by the known values of leaves descending from b. For example,
at the start of the computation p(b, 0) = p(b, 1) = 1. If enough leaves are evaluated
so that we are sure from the leaves descending from b alone that b has value 0, then
p(b, 1) = 0.
For each tree I E T and b E L, let v(t, b) be the value of t at b. Then
p(t) = 1 p(b, v(t, b)).
bEL
gives the probability that a randomly chosen tree from T with first n +1 levels equal
to t is not excluded. So we can let Pr(i) be the weighted average of p(t) for all t E Ti
such that t has root value r and category i.
The values of p(b, r) and p(t) calculated this way may be extremely small. To
make them easier to handle, we will change the normalization of p(b, r) as follows: if
we give the algorithm the correct value of b, as specified at the end of last subsection,
so p(b, r)
=
0 for some r E [0, 1]. Then we reset p(b, r)
=
1. Otherwise, by induction
we know the confidence level for the value of b is low, so the relative difference
between p(b, 0) and p(b, 1) is very small. We multiply both p(b, 0) and p(b, 1) by the
same constant so that p(b, 0) + p(b, 1)
=
2, so both p(b, 0) and p(b, 1) are close to 1.
Then p(t) and Pr(i) are calculated from p(b, r) as above. This is simply a change of
normalization, so Pr(i) are still relative probabilities.
Now we will make precise the claim that p(b, 0) and p(b, 1) are close to 1 before
either an exception or 03 k1 evaluations on b. If no exception occured at b and the
number of evaluations on b is less than /33, then
|Po
- Pi I
ckl,j for the value at b.
This is
IPo -Pit
=
|p(b, 0) - p(b, 1)|1
p(b, 0) + p(b, 1)
|p(b, 0) - p(b, 1)1 < CkIJ
2
so
|p(b, r) - I
ck- 1 ,j for r E [0, 1].
In particular, suppose p(b, r)' is the value of p(b, r) after evaluating a leaf descending from b, without obtaining the correct value of b. Then p(b, r)'
=
p(b, r)A
where
A E
'k,
1 + ck_1, 1 - ck_1, _
assuming by induction that
ck-1,j
-
| log Al < 2ck-1,,
(3.3)
is small.
Now we begin to estimate S, assuming that exception did not happen overall.
First the case 1 = k. We have at most
0 - 1 from evaluating more than
3
#
sure values for nodes b E L, including
k-1 leaves on nodes in L, and
1 from a possible
exception at some b E L. When a sure value is given, we update the value of p(b, r)
(and therefore Pr(i)) by first setting one of p(b, r) to zero and then change the other
to one. Therefore for at most /3 times during the computation, some p(b, r) is set to
zero. At all other times p(b, r) simply fluctuates around one.
We claim that the decrease in S due to setting some of p(b, r) to zero is modeled
by the division process in Lemma 3.2.1. First consider the case where we give the
value of b C L to the algorithm after evaluating
this case evaluating the
ok-1
#k-1
nodes descending from b. In
'th node descending from b can be considered as a query
for the value at b. Let pi (in the lemma) be the probability that a randomly chosen
tree from those remaining has 0 at b (this is not necessarily p(b, 0)/2). Then the
probability that b indeed has value 0 is exactly pi. Also, if b turns out to have value
0, then setting p(b, 1) = 0 without changing p(b, 0) will change S to piS. This gives
a correspondence to the situation in the lemma. Now consider the case where we
give the value of b E L due to an exception at b. Note whether an exception occured
depends only on the leaves evaluated and their values. So whenever the algorithm
evaluates a leaf a descending from b, it knows which values of a will produce an
exception at b. If both values of a will, then the situation is the same as the previous
case. If one of the values of a, say 0, will produce an exception, then let pi be the
probability that a randomly chosen tree from those remaining that has 0 at a will
have 0 at b (same probability as above, except conditioning on v(a) = 0). In the
event of an exception at b, then b has value 0 with probability pi, and S -+ p2 S in
this case. If no exception occured, then we do not consider this step as part of the
division process.
Given this correspondence, also assuming that the division process is not lucky,
we conclude that the decrease in S due to sure values of nodes in L is by a factor of at
most 5-2/5. That is, the decrease in log S, written as A log S, is at most (2 log 5)/5.
Now we estimate the decrease in log S due to other nodes. For each b E L, let
M(b) be the number of leaves descending from b that have been evaluated. For each
j < l, let A,
j, 1
=
{b E Lli- 1 < M(b) < 3}. Note the sets A9 depends on where
we are during the computation. If there is an exception at b E L, we will exclude b
from any A,. We have the obvious bound that |Aj
#1j+1
produce sure values which we considered above. For A, with
- 1. Nodes in Ak will
j
< k, we will repeatly
make use of Equation 3.3.
There are at most
most)
#
times when some p(b, r) is set to zero. Consider the (at
# + 1 intervals in between.
Since we will no longer use information about what
the algorithm knows during the computation (only Lemma 3.2.1 makes use of this
information), we can rearrange the order of evaluations within each interval, so that
evaluations descending from the same b E L are placed together. For each interval,
consider the sets Aj just before the end of the interval. For each b E Aj, j < k, the
total change in log p(b, r) during the interval is bounded by 2 Ck_1,J (here we also count
the change of p(b, r) to 1 immediately after b gives an exception). This is also the
maximum possible change in Pr (i) and therefore S. Summing these together, we have
A log S < 2log
5
Using values of
Ck_,J,
+ 2(/3 +
1)(ckl,k_1/3
2
+
3
ck-1,k-2
...
we have A log S < (2log h)/5. So S ;> 253/.
Now consider the case 1 < k. We will not use Lemma 3.2.1 at all so we may
rearrange the evaluations to put those for the same b E L together, also putting
evaluations for the only exception at the front (if there is any). The possible exception
will simply reduce each Pr(i), and therefore S, by a half. Adding to this reduction of
S due to other nodes, we have
A log S < log 2 + 2(c_1,f
+
ck_1,1__10
2
+...)
<
log 2.
So
S>
3.4.3
25
=n.
Bounds on D
Again, we rearrange the evaluations to put those for the same b E L together, and
putting evaluations for those b E L for which we have sure values at the front. So in
the first phase, the evolution of Pr(i) is just the same as the k = 1 case.
At the end of this phase, each p(b, r) is either 1 or 0. For each b c L, let Pr(i,b)
be the probability that a remaining tree with root r and category i will have 0 at
b (clearly we only need to consider those i, r for which Pr(i)
#
0).
By the same
reasoning as in the case k = 1, there is an integer h such that Pr(i, b) = 1/2 for
i > h, and either 0 or 1 for i + no < h. If Pr(i, b) = 0 or 1 it will not change again.
Suppose leaves descending from b are evaluated during the second phase, and consider
all nodes b' E L visited before b. For each such b' E A (where Aj are constructed at
the end of the computation), the change in log p(t) for each tree t C T due to b' is at
most
2
Ck_1,j
as before. This shows that for root value r and category i, the change
in log p,(i, b) is at most 4ck_1,J. Summing over b' E L visited before b, we have
A log p,(i, b) 53 4(ck_1,1# + Ck_1,1_10
2
+
. .
where the first term is omitted for the case 1 = k (and same below). This means for
i > h, we have
Ipr(i, b) -
1
< 4(ck_1,,
+ Ck_1,-1#2 +
(3.4)
... ),
just before leaves descending from b are evaluated.
Now we begin to estimate D. Suppose b E L is visited in the second phase,
with b E A3 . Since b is not visited in the first phase, we have p(b, 0) = p(b, 1)
=
1
beforehand. Now let p(b, 0) and p(b, 1) denote their values after visiting b. We have
jp(b, r) - 11 <
Ck1,3 .
Let Pr(i)o be the value of p,(i) before visiting b, and let p,(i1
be the value after. Then we have
Pr(i = Pr(z)0 [Pr(i, b)p(b, 0) + (1 - pr(i, b))p(b, 1)1.
In particular
Pr(i) $ 1 +
2
(ck_1,#
+ ck-1,1_.12) + - - -
1,
throughout the computation.
There are three cases for i:
Case 1, i + no < h: we know po(i, b) and p1 (i, b) are either both 0 or both 1. So
Ipo(i)
- pi(i)I are multiplied by the same factor less than
of at most multiplying D by 1 +
2
2
Ck_1,J.
This has the effect
ck_1,.
Case 2, i > h: in addition to the same effect above, there is another addition to
D caused by the difference between po (i, b) and pi (i, b). This difference is bounded
by Equation 3.4. The addition to D from this source is at most
b) - p 1 (i, b))(p(b, 0) - p(b, 1))l < 8(ck-1,#
(maxpr(i)o)|(po(i,
r
+
...
) -
2
c_1J.
Case 3 i < h < i + no: we have no assumption on p,(i, b). But jp(b, 0) - p(b, 1)|
is still bounded. So the addition to D from this part is at most
(maxpr(i)o)nolp(b,0) - p(b, 1) ,< no - 2 Ck_1,.
Summing these together for all b, putting multiplications from case 1 and 2 after
the additions from case 2 and 3, we have
D ;< [Do + 165(ck-1,10 +
...
)2
+ 2no(ck-1,/
+
...
)](1 + 2 ck1,/ 3 +...),
where Do is the value of D after the first phase, which by the argument in the case
k = 1, is at most 2no# if 1 = k and 0 if 1 < k. The point is that any
Ck,1
contains a
factor of n-' with a > 3/5, so any term other than Do in the expression above is at
most of order n-1/5. If I = k, then Do dominates the sum, so for this case D < 2no3.
If 1 < k, from the values of
ckl,j
we see that the terms 165T(ck_1,10)
2
and 2 noCk_1,1/
dominate.
Combining the bound on D and S, we have for 1 = k,
D
ckk _
2noQ
3/5
and for I < k, it is a straightforward calculation to verify that
Ck,1,
(8 no0 k-+13(k-1+1)/51 3 3+k-1
satisfies the recursion relation.
3.4.4
Bounds on Pk,l
Now we begin the proof of the bound on Pk,.
By induction we can use the values
Pk-1,j, which give the probability of getting exceptions at subtrees (rooted at nodes
in L, same below). We say an overall exception is of type I if it is caused by at least
two exceptions on subtrees, and type II if it is a result of luckiness in the division
process.
We make two observations to prepare for the proof. First, note that the condition
for getting an exception at the root is the same among all 1 less than k. Moreover
this condition is the same as the condition for getting a type I exception for the case
1 = k. For 1 = k, if less than 0-
leaves are evaluated, then the division process
has not began, so the probability of getting a type II exception conditioning on the
values known so far is still bounded by 5- 1 / 5 .
Second, note that for large n, the number of nodes in L is far greater than the
number of evaluations we have. This means we can combine evaluations of different
subtrees into the same subtree. Let's make this more precise since it will be used
repeatedly below. Suppose an algorithm evaluated leaves from subtrees rooted at
bi, . . . , bi.
The algorithm may interleave evaluations on these subtrees. We may
simulate this process on a single subtree rooted at b as follows: for each 1 <
j
< m,
we set aside a sufficiently large set Bi of subtrees of b (that is, rooted at level 2n + 2
from the original root). Each time the algorithm evaluates a new subtree b'8 of bi, we
choose a new subtree b' in Bi and keep track of the correspondence so we can simulate
any further evaluations on b'. as evaluations on b',. The key difference produced by
this simulation is that the total number of evaluations on b is usually larger than the
number of evaluations on each bi. The number of steps in the division process in b
may also be larger. However, the number of evaluations on a subtree of bi is the same
as the number of evaluations on the corresponding subtree of b.
We begin with the case 1 < k-2. Here only type I exceptions are relevant. Assume
that there exists an algorithm A that has probability P of producing exceptions at
more than one subtree. This algorithm may evaluate on as many as
will produce a new algorithm B simulating the old one using only
#
#1subtrees.
We
subtrees. Each
time A begins evaluations on a new subtree b, the algorithm B randomly chooses one
of the
/
subtrees, which we call b, and sets apart enough nodes at level n + 1 of b to
simulate future evaluations of A on b. Now the probability of A producing exceptions
at more than one subtree is P. If this happens, consider the first two subtrees bi, b2
at which an exception is produced.
The probability that b1 and b2 are the same
subtree is exactly
#1'.
If they are not the same, algorithm B also produced at least
two exceptions. This is because the condition for an exception at a subtree stays
the same before
1k-2
evaluations are taken, so the extra evaluations on b compared
to b does not matter. Also ok-2 evaluations are not enough to begin the division
process at any b or b. This shows the probability of exception for algorithm B is at
least P(1 -
#-1).
On the other hand, we may consider the algorithm B as choosing
#
subtrees and evaluating them up to /'. So the probability of producing two exceptions
is at most p
1 ,,#(#
-
1)/2 (adding together probabilities of exceptions at each pair
of subtrees). So
P(1
-
-)
2
0-1)
==-
P < (pk_1,1) 2 .
# subtrees.
Now consider the case 1 = k - 1. Again we simulate on only
The new
feature is that now it is possible to produce exceptions of type II on a subtree. We
#k-2
need
#
evaluations on a subtree to begin the division process, so there are at most
attempts, for a probability of at most
#5~1/
5
to get one exception of type II. Now
suppose A produced an exception of type I at subtree b, then there is an exception
of type I on b, even though the number of evaluations on b may be as many as 3k-1.
So the probability of one exception of type I is at most
Pk-1,k-1/3
and by the same
argument as in case I < k - 2, the probability of two exceptions of type I is at most
(Pk-1,k-1/3) 2 .
Adding together all ways to produce two exceptions, we get
Pkk-I < (#5-1/ 5 + Pk-1,k-10)2
4 22 ii- 2 / 5
Finally we consider the case 1 = k, again simulating on
# nodes.
We may consider
exceptions of type I and type II separately. Probability of exceptions of type II is just
h-1/
5
. Now focus on exceptions of type I. Suppose algorithm A has an exception on
a subtree b. There are again two cases: first, the exception is of type II. Since ok-2
evaluations at b are required to initiate the division process, there are at most 32 tries.
So the probability of getting one such exception is at most
#
2
h-1/
5
. Now suppose
the exception at b is of type I. This means there are exceptions on two subtrees of b.
Then there are also exceptions on two subtrees of b. The problem is that the number
of evaluations on b may exceed
#0-1,so
the available values of Pk-1,i cannot help
directly.
This suggests that we establish the following statement, used with k -+ k + 1 and
b being a tree from
T
k.
Lemma 3.4.1. Suppose an algorithm made less than #k+1 evaluations on a tree from
2
Tk, then the probability of getting one exception on subtrees is < (3+#
probability of getting two exceptions on subtrees is ;< (/33 +
)i- 1 /5 . The
/32)2A-2/5.
Proof. We will prove this by induction on k. For case k = 1, there is no such thing
as exceptions on subtrees, so the probability is zero.
Given algorithm A, we produce a new algorithm B by simulating on
#2
subtrees,
again randomly assigning any subtree b evaluated by A to one of the subtrees b in
the simulation. If A evaluates more than ok-1 leaves on a subtree b, then it will no
longer produce an exception at b, so we may as well stop simulating that subtree.
If A produces a type II exception on a subtree b, then it must have evaluated more
than
#k-2
leaves on b, so there are at most 3 tries, and the probability of getting
one exception this way is at most
#33 i5- 1/ 5.
If A produces a type I exception on b,
then we may use the case k - 1 of this lemma as long as the number of operations
made on b does not exceed
#k.
We will bound the probability that the number
of evaluations will exceed 13k on any of the 32 subtrees used in the simulation. In
the simulation the number of evaluations corresponding to one subtree in A is at
most
#k-1.
Moreover the total number of evaluations is at most
/3k+1.
Lemma 3.4.2
below shows the probability that the number of evaluations on a given one of the
/2
subtrees exceeding
/k
is at most of order 1/0!. The value /! is superpolynomial
in h, so this probability may be ignored, even after multiplying by
/32.
So up to a
negligible probability, a type I exception on b implies at least two exceptions on the
subtrees of :, which has probability < (#3+/
lemma. The probability of collision is now
2
#-2.
) 2 5- 2 / 5 < h-1/ 5
by the k -1 case of the
So the probability of getting one type
I exception is at most 5-1/ 5 # 2 and the probability of getting two type I exceptions is
at most
5 -2(/32
2
2
1) (1
(1
<i5~2/5o4
by a similar argument as before. Adding together the probabilities we get the lemma
for case k.
Lemma 3.4.2. Given a finite sequence of real numbers {ai}. Suppose each ai <
and their sum is at most
#k1.
0k-1
Consider a subset S of {ai} constructed by placing
each ai into S with probability 0-2. Then the probability that
ZaiS
ai < #k is of
order 1/n!.
Proof. Normalize by dividing everything by
#k-1.
The sum of elements in S is a
random variable X that is the sum of random variables Xi, with each mean value
X, <
#2,
and
j
Xi
value with probability
1. Each Xi takes 0 with probability 1 -
#-2.
#-2
and a nonzero
Then the distribution of X is very closely poisson with
parameter at most 1. So the probability that X >
#
is at most of order 1/3!.
E
To finish the proof, we just need to sum together all possibilities for the case 1 = k.
This gives
Pk,k < h-1/5 + (
2 f,- 1/5 _
(3 +
2 2ii- 2/ 5 )2
< h-1/5.
One can check that the bounds of Pk,l given in Eq. 3.1 are satisfied.
3.5
An 'Easy' Boolean Function
In this section we give a boolean function that does not satisfy the conditions of the
theorem, and for which the classical algorithm can evaluate faster than implied by
the theorem. Consider the AND gate with (0, 0) and (1, 1) being the trivial inputs.
Condition 1 of the theorem is satisfied but condition 2 is not, since any tree with
root value 1 must have value 1 at all leaves. Consider any tree with height n and
satisfying the k-faults rule. If the root value is 0, then some thought shows that at
least a proportion
2 -k
of the leaves must have value 0. If the root has value 1 then
all leaves have value 1. So one can obtain the value of the root by sampling O( 2 k)
leaves, and return 0 if a leaf with value 0 is found. This gives high probability of
success when the number of evaluations is a large but constant multiple of 2 *.
Chapter 4
Quantum Walk Through Decision
Trees
In this chapter we will apply some previous results to quantum walks on decision
trees, as formulated by Farhi and Gutmann in [3]. In the first section, we will briefly
summarize the construction and main results in [3]. In the second section, we will
give a class of decision trees that is quantum penetrable in polynomial time.
4.1
Summary of Previous Work
Abstractly, a decision tree is just a binary tree. However, here we are considering the
binary tree as encoding some decision problem that involves finding a solution satisfying certain constraints, among a large number of possible solutions. For example,
in 3-SAT problem on n variables, we are asked to find an n-bit string (x 1 ,...
, xn)
satisfying all of the given clauses. The example given in [3] is the "exact cover" problem. In this problem, we are given a set S and a collection A of subsets of S. We
want to find a subcollection A' c A such that the union of sets in A' is S, and that
the sets in A' are mutually disjoint. Again we can consider a possible solution as an
n-bit string, where n is the size of A. The strings encode which sets in A are chosen.
Each of the above two problems can be considered as a binary tree, with each
node in the tree representing a partial solution. One way of doing this, which we
will consider here, is to associate each node at level i as a choice of the first i bits,
and the two child nodes of each node b correspond to the two ways of extending the
partial solution of b (in this chapter we will let root be at level 0, so the leaves are at
level n). Starting from the complete tree with height n, we remove all leaves that do
not satisfy the constraint. We may also remove nodes that represent partial solutions
that cannot be extended to a valid solution, for example those already violating some
of the constraints. The original problem is then converted to the problem of finding
a node at level n, starting from the root.
One way of finding a node at level n is to perform a random walk through the
tree, starting from the root. At each time step, we choose a neighbor of the current
node with uniform probability and move to that neighbor. We say a class of decision
trees is classically penetrable if we can arrive at a node at level n within polynomial
time. That is, there exists constants A, B such that for large height n, the probability
that we arrive at a node in level n before n^ steps is greater than n-B
Now we construct a quantum system from this binary tree, by taking the Hamiltonian to be the adjacency matrix. In the original paper, the Hamiltonian is defined
as (ajHb)= -1 if a, b are joined by an edge, and (a|Hla)= d(a), where d(a) is the
number of neighbors of a. With this Hamiltonian the equations of motion resembles
that of the classical random walk. However, we will simply use the adjacency matrix
for a closer correspondence with previous sections. The difference is mostly a shift
in overall energy and a change of sign, so it should not have a large effect on the
analysis.
One way to define quantum penetrability for a class of decision trees is as follows.
Start with the initial state |root) localized at the root of the tree. If there are constants A, B and C such that after evolving the system for time CnA followed by a
measurement with the nodes as the basis, then the probability that the state is on a
node in level n is at least n- . In other words, there is A, B and C such that
S I(aIe-iHt Iroot)I1 >n -B,
2
a
where t = CnA, and the sum is taken over all nodes a in level n. This problem is
found to be more difficult to analyze. Therefore we will make two simplifications:
first, we suppose there is only one node at level n, and second, we attach long lines of
nodes to both the root and the unique node at level n. Then we can consider the tree
as a runway with several branches coming out of it. See Figure 4-1 for an example.
The initial state is a wave packet on the left part of the runway travelling toward
the right, with a peak around E = 0 in the energy spectrum. As the system evolves,
the packet will travel through the trees coming out the runway, and one part of
it will clear all of the trees and continue toward the right. For a given energy E,
we may define the transmission coefficient T(E) as usual. Then we say the tree is
quantum-penetrable if there exist constants A, B and C such that |T(E) I > n-B for
all |El < Cn-^. Moreover the phase of T(E) should not fluctuate with frequency
more than polynomial in n. This condition implies that a wave packet containing
energies E with |El < Cn-A will mostly transmit through the trees, so the time
of evolution and length of runway required for the packet to pass through are both
polynomial in n.
As in the previous chapters, we can define a function ym(E) for each tree coming
out of the node m. If
im)
is the m'th node on the runway and Im') is the node
just above m in Figure 4-1, then we define ym(E) = (m'|E)/(m|E), where |E) is
the eigenstate of the system with energy E (for this definition we again imagine the
runways to be infinite. See section 1.1).
The transmission coefficients T(E) are determined from ym(E). See [3] for details.
Here we will just give the result. Let
Mm
E-
ym(E)
1
-1
0
and
a
M
b
= Maa Mn-2 . .. MO =.
(c d
0
2
3 4
Figure 4-1: An example of a decision tree with two lines attached to the root and the
unique node at level 4.
Then
T(E)
i
c- b+ (d-
21sin 0
a) cosO+ (d+ a) sin0'
(4.1)
where
E = 2cosO.
Note that the expression for Mm and E are slightly different due to the use of a
different Hamiltonian.
Farhi and Gutmann gave in [3] a family of trees that is quantum penetrable but
not classically penetrable. However, it is possible to design a classical algorithm that,
knowing only the depth of nodes in the tree (that is, which way is up and which way
is down), can find the node at level n in polynomial time for this family of trees. Now
we will give a wider class of trees that is quantum penetrable, but for which there
may not be such classical algorithms.
4.2
Main Result
First, we establish the following intuitive result: if for some energy E near zero,
each ym (E) is small (that is, each tree coming out of the runway has transmission
|T(E)|
coefficient near 1), then the overall amount of transmission
is near 1. Moreover
the phase of |T(E)| does not fluctuate quickly.
We will prove this by visualizing the entries in M as a weighted sum over paths
from left to right in Figure 4-2, where each path is given the weight that is the product
of weights on edges in the path.
E-y1 (E) E-y1 (E)
2
1
-1
1
-1
E-%(E)
1
_-1
1
-_1
1
-1
2
Figure 4-2: Visualization of entries in M. Each Mij corresponds to a weighted sum
of paths going from i on the left to j on the right.
This is a result of writing
Ij =
(Mn-1)i,8_1
(Mn-2)8(I),
1,
S1,S2--. in-1~
1,2
.1. (M 1 )S2 ,8 1 (Mo),,.
If E is small and each ym(E) are small, then going horizontally will sharply decrease the weight of a path. So the paths with the highest weights by magnitude are
those that never travelled along the horizontal line. If n is even, there is such a path
for each i =
i/
j,
j,
whose weights are 1. If n is odd, then there is such a path for each
with weights ±1. There are at most n" paths that travel along the horizontal
line for a steps. Let c = maxm(E - ym(E)). Then the total contribution from paths
that travel along the horizontal line is at most Ea>1 Cana = O(cn) for cn small. When
.E z: 0, we have cos 0 = 0(E) and 1 - sin 0 = 0(E 2 ). From equation 4.1, we see that
1 - T(E)| = O(cn) + O(E 2 ). Moreover T(E) is always close to one of ±ie--in, so the
phase also does not fluctuate much.
In particular, if there is EO and sm such that for each m and
|El
< Eo there
is lym(E)| < smjEj, then the transmission coefficient T(E) satisfies 1 - |T(E)I
=
O(maxm nsm|E I) + O(nIE1). Combining with the results in section 2.1, we have the
following result.
Theorem 4.2.1. For a given decision tree, consider each tree Tm coming out of the
runway as a NAND tree with each leaf given value 0. If each Tm evaluates to 1 and
satisfies k-faults rule for a fixed k, then 1-|T(E)I= O(n 22k|E|) for |El
O(n- 2 2-k).
Moreover the phase of T(E) does not fluctuate much in this range of E. In particular,
a class of decision trees is quantum penetrable if each tree satisfies this condition with
k < a log n for a fixed constant a.
D
Proof. Follows from previous discussion with sm = O(n - 2 k).
For instance, the example of even-height bushes given in
[3]
contains trees with
no faults. The condition on the parity of the tree corresponds to the condition that
these trees evaluate to 1 at the root.
Chapter 5
Conclusion
In this paper we investigated the running time of quantum walk algorithms for evaluating boolean functions on a special set of inputs. We showed that on these inputs,
there is a super-polynomial separation in cost between the quantum algorithm and
classical algorithms. We also applied this result to give a class of decision trees that is
quantum penetrable, but may be difficult for a classical algorithm to search through.
There are still many directions in which one can extend these results. First, one
can try to formulate a better condition for when a NAND tree can be evaluated
quickly by the quantum walk algorithm (that is, for which trees is the growth rate of
y(E) polynomial in n). We showed at the end of section 2.3 that there are trees which
do not satisfy k-faults rule for any small k, but nevertheness have y(E) growing at
rate polynomial in n. We would like to find a succinct condition that includes these
trees and many other such examples.
This question can also be asked for general span programs. Since the process
of assigning weights to nodes described in section 2.3 is used to motivate the above
example, a related question is how to generalize this process to other span programs,
and how to construct rigorous proofs based on this technique.
The same situation applies to classifying quantum penetrable decision trees. It is
clear that the previous question is a necessary first step in this classification. One
can also consider what can be said in the related but more realistic situation where
there are many valid solutions and there are no lines attached to those solutions.
In the realm of classical computing, we have not yet proved that the class of
decision trees given in section 4.2 is not efficiently searchable by a classical algorithm
that knows depth in the tree. Finally, we can ask whether it is possible to construct
more realistic classical problems (for example, non-oracular problems), that generate
NAND trees, general span programs, or decision trees with restriction to inputs as in
the paper.
Bibliography
[1] Andris Ambainis. A nearly optimal discrete query quantum algorithm for evaluating NAND formulas. arXiv:0704.3628v1 [quant-ph].
[2] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. A quantum algorithm for
the Hamiltonian NAND tree. quant-ph/0702144v2.
[3] Edward Farhi and Sam Gutmann.
quant-ph/9706062.
Quantum computation and decision trees.
[4] Ben W. Reichardt. Span-program-based quantum algorithm for evaluating unbalanced formulas. arXiv:0907.1622v1 [quant-ph].
[5] Ben W. Reichardt. Span programs and quantum query complexity: The general
adversary bound is nearly tight for every boolean function. arXiv:0904.2759v1
[quant-ph].
[6] Ben W. Reichardt and Robert Spalek. Span-program-based quantum algorithm
for evaluating formulas. arXiv:0710.2630v3 [quant-ph].
[7] Michael Saks and Avi Widgerson. Probabilistic boolean decision trees and the
complexity of evaluating game trees. Proc. 27th FOCS, pages 29-38, 1986.