The Learnability of Quantum States Scott Aaronson University of Waterloo Quantum State Tomography Suppose we have a physical process that produces a quantum mixed state By applying the process repeatedly, we can prepare as many copies of as we want To each copy, we then apply a binary measurement E, obtaining ‘1’ with probability Tr(E) and ‘0’ otherwise Our goal is to learn an approximate description of EXPERIMENTALISTS ACTUALLY DO THIS To learn about chemical reactions (Skovsen et al. 2003), test equipment (D’Ariano et al. 2002), study decoherence mechanisms (Resch et al. 2005), … But there’s a problem… not! Why state wouldof n qubits, we To do tomography onFear an entangled he be raising this need (4n) measurements “problem” if he wasn’t The current record: 8gonna qubitsdemolish (Häffner it? et al. 2005), requiring 656,100 experiments (!) Does this mean that a generic 10,000-particle state can never be “learned” within the lifetime of the universe? If so, would call into question the operational status of quantum states themselves (and make quantum computing skeptics extremely happy)… The Quantum Occam’s Razor Theorem Let be an n-qubit mixed state. Let D be a distribution over two-outcome measurements. Suppose we draw m measurements E1,…,Em independently from D, and then output a “hypothesis state” such that |Tr(Ei)-Tr(Ei)|≤ for all i. Then provided /10 and 1 n 1 1 m 2 2 2 2 log log , we’ll have Pr Tr E Tr E 1 ED with probability at least 1- over E1,…,Em Remarks Implies that we can do “pretty good tomography,” using a number of measurements that grows only linearly (!) with the number of qubits n Result says nothing about the computational complexity of preparing a hypothesis state that agrees with measurement results Can make dependence and and more reasonable, at the cost of a log2n factor: 1 n n 1 2 m O log log 2 1 n 1 The above bound is nearly tight: m 2 log To prove the theorem, we need a notion introduced by Kearns and Schapire called Fat-Shattering Dimension Let C be a class of functions from S to [0,1]. We say a set {x1,…,xk}S is -shattered by C if there exist reals a1,…,ak such that, for all 2k possible statements of the form f(x1)a1- f(x2)a2+ … f(xk)ak-, there’s some fC that satisfies the statement. Then fatC(), the -fat-shattering dimension of C, is the size of the largest set -shattered by C. Small Fat-Shattering Dimension Implies Small Sample Complexity Let C be a class of functions from S to [0,1], and let fC. Suppose we draw m elements x1,…,xm independently from some distribution D, and then output a hypothesis hC such that |h(xi)-f(xi)| for all i. Then provided /7 and 1 2 1 1 m 2 2 fat C log log , 35 we’ll have Pr hx f x 1 xD with probability at least 1- over x1,…,xm. Proof uses a 1996 result of Bartlett and Long—building on Alon et al., building on Blumer et al., building on Valiant Upper-Bounding the Fat-Shattering Dimension of Quantum States Nayak 1999: If we want to “encode” k classical bits into n qubits, in such a way that any bit can be recovered with probability 1-p, then we need n(1-H(p))k Corollary (“turning Nayak’s result on its head”): Let Cn be the set of functions that map an n-qubit measurement E to to Tr(E), for some . Then No need thank me! fat O n . Cn 2 Quantum Occam’s Razor Theorem follows easily… Simple Application of Quantum Occam’s Razor Theorem to Communication Complexity x y Alice Bob f(x,y) f: Boolean function mapping Alice’s N-bit string x and Bob’s M-bit string y to a binary output D1(f), R1(f), Q1(f): Deterministic, randomized, and quantum one-way communication complexities of f How much can quantum communication save? • It’s known that D1(f)=O(M Q1(f)) for all total f • In 2004 I showed that for all f, D1(f)=O(M Q1(f)logQ1(f)) Theorem: R1(f)=O(M Q1(f)) for all f, partial or total Proof: Fix Alice’s input x By Yao’s minimax principle, Alice can consider a worstcase distribution D over Bob’s input y Alice’s classical message will consist of y1,…,yT drawn from D, together with f(x,y1),…,f(x,yT), where T=(Q1(f)) Bob searches for a quantum message that yields the right answers on y1,…,yT By the Quantum Occam’s Razor Theorem, with high probability such a yields the right answers on most y drawn from D What about computational complexity? BQP/qpoly: Class of problems solvable in quantum polynomial time, with help from poly-size “quantum advice state” that depends only on input length n A. 2004: BQP/qpoly PP/poly “Classical advice can always simulate quantum advice, provided we use exponentially more computation” Can this result be improved to BQP/qpoly QMA/poly? (QMA: Quantum Merlin-Arthur) Theorem: HeurBQP/qpoly HeurQMA/poly Or in English: We can use trusted classical advice to verify that untrusted quantum advice will work on most inputs Proof Idea: The classical advice to the HeurQMA/poly verifier will consist of “training inputs” x1,…,xm where m=poly(n), as well as whether xiL for all i Given a purported quantum advice state |, the verifier first checks that | yields the right answers on the training inputs, and only then uses it on its real input x By the Quantum Occam’s Razor Theorem, if | passes the initial test, then w.h.p. it works on most inputs Technical part is to do the verification without destroying | Stronger Result: HeurBQP/qpoly = HeurYQP/poly Here YQP (“Yoda Quantum Polynomial-Time”) is like QMAcoQMA, except that a single witness must work for all inputs of length n Open Problems Computationally-efficient learning algorithms Experimental implementation! Tighter bounds on number of measurements Does BQP/qpoly = YQP/poly? Is D1(f) = O(M Q1(f))?