The Learnability of Quantum States

advertisement
The Learnability of
Quantum States

Scott Aaronson
University of Waterloo
Quantum State Tomography
Suppose we have a physical process that produces a
quantum mixed state 
By applying the process repeatedly, we can prepare as
many copies of  as we want
To each copy, we then apply a binary measurement E,
obtaining ‘1’ with probability Tr(E) and ‘0’ otherwise
Our goal is to learn an approximate description of 
EXPERIMENTALISTS
ACTUALLY DO THIS
To learn about chemical reactions (Skovsen et al.
2003), test equipment (D’Ariano et al. 2002), study
decoherence mechanisms (Resch et al. 2005), …
But there’s a problem…
not! Why state
wouldof n qubits, we
To do tomography onFear
an entangled
he be raising this
need (4n) measurements
“problem” if he wasn’t
The current record: 8gonna
qubitsdemolish
(Häffner it?
et al. 2005),
requiring 656,100 experiments (!)
Does this mean that a generic 10,000-particle state can
never be “learned” within the lifetime of the universe?
If so, would call into question the operational status of
quantum states themselves (and make quantum
computing skeptics extremely happy)…
The Quantum Occam’s
Razor Theorem
Let  be an n-qubit mixed state. Let D be a distribution
over two-outcome measurements. Suppose we draw
m measurements E1,…,Em independently from D, and
then output a “hypothesis state”  such that
|Tr(Ei)-Tr(Ei)|≤ for all i. Then provided /10 and
 1  n
1
1 
m   2 2  2 2 log  log  ,

 
   
we’ll have
Pr  Tr E   Tr E      1  
ED
with probability at least 1- over E1,…,Em
Remarks
Implies that we can do “pretty good tomography,” using
a number of measurements that grows only linearly (!)
with the number of qubits n
Result says nothing about the computational
complexity of preparing a hypothesis state that agrees
with measurement results
Can make dependence and  and  more reasonable,
at the cost of a log2n factor:
1 n
n
1  
2


m  O 
log
 log  
2
   
 
      
1 n
1 

The above bound is nearly tight: m    2  log  
 
 
To prove the theorem, we need a notion
introduced by Kearns and Schapire called
Fat-Shattering Dimension
Let C be a class of functions from S to [0,1]. We say a set
{x1,…,xk}S is -shattered by C if there exist reals a1,…,ak
such that, for all 2k possible statements of the form
f(x1)a1-  f(x2)a2+  …  f(xk)ak-,
there’s some fC that satisfies the statement.
Then fatC(), the -fat-shattering dimension of C, is the
size of the largest set -shattered by C.
Small Fat-Shattering Dimension
Implies Small Sample Complexity
Let C be a class of functions from S to [0,1], and let fC.
Suppose we draw m elements x1,…,xm independently from
some distribution D, and then output a hypothesis hC
such that |h(xi)-f(xi)| for all i. Then provided /7 and
 1 
  2 1
1 

m   2 2  fat C   log
 log  ,

 
 35 
  
we’ll have
Pr  hx   f x      1  
xD
with probability at least 1- over x1,…,xm.
Proof uses a 1996 result of Bartlett and Long—building on
Alon et al., building on Blumer et al., building on Valiant
Upper-Bounding the Fat-Shattering
Dimension of Quantum States
Nayak 1999: If we want to “encode” k classical bits into
n qubits, in such a way that any bit can be recovered
with probability 1-p, then we need n(1-H(p))k
Corollary (“turning Nayak’s result on its head”):
Let Cn be the set of functions that map an n-qubit
measurement
E to to
Tr(E), for some . Then
No need
thank me! fat    O n .
Cn
 2 
 
Quantum Occam’s Razor Theorem
follows easily…
Simple Application of Quantum Occam’s
Razor Theorem to Communication Complexity
x
y
Alice
Bob
f(x,y)
f: Boolean function mapping Alice’s N-bit string x and
Bob’s M-bit string y to a binary output
D1(f), R1(f), Q1(f): Deterministic, randomized, and
quantum one-way communication complexities of f
How much can quantum communication save?
• It’s known that D1(f)=O(M Q1(f)) for all total f
• In 2004 I showed that for all f,
D1(f)=O(M Q1(f)logQ1(f))
Theorem: R1(f)=O(M Q1(f))
for all f, partial or total
Proof: Fix Alice’s input x
By Yao’s minimax principle, Alice can consider a worstcase distribution D over Bob’s input y
Alice’s classical message will consist of y1,…,yT drawn
from D, together with f(x,y1),…,f(x,yT), where T=(Q1(f))
Bob searches for a quantum message  that yields the
right answers on y1,…,yT
By the Quantum Occam’s Razor Theorem, with high
probability such a  yields the right answers on most y
drawn from D
What about computational complexity?
BQP/qpoly: Class of problems solvable in quantum
polynomial time, with help from poly-size “quantum
advice state” that depends only on input length n
A. 2004: BQP/qpoly  PP/poly
“Classical advice can always simulate quantum advice,
provided we use exponentially more computation”
Can this result be improved to BQP/qpoly  QMA/poly?
(QMA: Quantum Merlin-Arthur)
Theorem: HeurBQP/qpoly  HeurQMA/poly
Or in English: We can use trusted classical advice to verify
that untrusted quantum advice will work on most inputs
Proof Idea: The classical advice to the HeurQMA/poly
verifier will consist of “training inputs” x1,…,xm where
m=poly(n), as well as whether xiL for all i
Given a purported quantum advice state |, the verifier
first checks that | yields the right answers on the training
inputs, and only then uses it on its real input x
By the Quantum Occam’s Razor
Theorem, if | passes the initial test,
then w.h.p. it works on most inputs
Technical part is to do the verification
without destroying |
Stronger Result: HeurBQP/qpoly = HeurYQP/poly
Here YQP (“Yoda Quantum Polynomial-Time”) is like
QMAcoQMA, except that a single witness must work for all
inputs of length n

Open Problems

Computationally-efficient learning algorithms
Experimental implementation!
Tighter bounds on number of measurements
Does BQP/qpoly = YQP/poly?
Is D1(f) = O(M Q1(f))?
Download