QEC 2014, Zurich, Switzerland Fault-Tolerant Quantum Computation in Multi-Qubit Block Codes Todd A. Brun University of Southern California With Ching-Yi Lai, Yi-Cong Zheng, Kung-Chuan Hsu Why large block codes? Many schemes for fault-tolerant quantum error correction have been proposed. All require relatively low rates of error, though the ability to tolerate errors has slowly been improved by a long string of theoretical developments. Most also require a large amount of overhead—in some cases, a very large amount. A logical qubit is encoded in hundreds or thousands of physical qubits, if not more. It was long ago observed (by Steane, and others) that codes encoding multiple qubits can achieve significantly higher rates for the same level of protection from errors. But performing logical gates in such block codes is difficult. I will briefly sketch a scheme for computation where multi-qubit block codes are used for storage (storage blocks). Clifford gates and logical teleportation are done by measuring logical operators, and a different code (the processor block) is used with a non-Clifford transversal gate for universality. Syndromes and logical operators are measured by Steane extraction, so this scheme uses only ancilla preparation, transversal circuits, and single-qubit measurements. Steane, Nature 399, 124-126 (1999). Outline of the scheme 1. When not being processed, logical qubits are stored in [[n,k,d]] block codes. These codes are corrected by repeatedly measuring the stabilizer generators via Steane extraction. 2. Logical Clifford gates can be done by measuring sequences of logical operators. This is also done by Steane extraction with a modified ancilla state. 3. A non-Clifford gate is done by teleporting the selected logical qubits into the processor block, and teleporting back after the gate. Teleportation is also used between storage blocks. 4. The processor blocks use codes that allow transversal circuits for the encoded gates. E.g., for the T gate we could use the concatenated [[15,1,3]] shortened Reed-Muller code. 5. Logical teleportation is also done by measuring a sequence of logical operators. Outline of the scheme M M M P P M M M P P Processor Array Memory Array Am Am Am Ap Ancilla Factor y ...... ...... Ap Correcting Errors The storage blocks should use a code that stores multiple qubits with high distance, and a rate that is better than that achieved by encoding the qubits separately (as in a concatenated scheme or topological code). The logical error rate must be sufficiently low that we expect no logical errors in any block for the entire duration of the quantum computation. The processor block must also have a very low rate of logical errors, but the demands are not quite as stringent as for the storage blocks. The rate must be low enough that the probability of an uncorrectable error during any of the non-Clifford gates is low. How large a computation can be done in this scheme? Clearly, for a fixed size of code block, one cannot do computations of unlimited size. However, it may be possible to protect information for sufficiently long to do a nontrivially large quantum computation. If the error rate is low enough, it may be possible to carry out any computation we could conceivably do in practice. Let pS be the logical error rate of the storage block, and pP be the logical error rate of the processor block, per “round” of computation. One storage block encodes k logical qubits. Suppose we wish to do a computation involving N logical qubits with circuit depth D (measured in rounds of computation), and using M non-Clifford gates. Then, roughly speaking, we expect to be able to complete this computation if ND 1 << , k pS and M << 1 . pP A “round” of computation is essentially one iteration of Steane syndrome extraction. We call 1/pS and 1/pP the code lifetimes. Steane Syndrome Extraction Steane syndrome extraction can be done on any CSS code. All the Xtype stabilizer generators are measured in one shot, and so are all the Z-type stabilizer generators. This requires two ancilla states, each of which is a copy of the same code as the codeword. The first is prepared with all the logical qubits in the |+> state, the second all in the |0> state. After the circuit of transversal CNOTs, all the ancilla qubits are measured in the Z or X basis, respectively. The syndrome bits are parities of these measurement outcomes. Measuring logical operators A simple variant of the Steane extraction procedure will also let us measure logical operators. Which logical operators are measured is determined by the logical state of the ancillas. Suppose we want to measure the logical operator Zj for the jth logical qubit. Then we prepare the jth logical qubit of the Z ancilla state in the state |0> instead of |+>. We determine the value of Zj by taking a parity of qubit measurements, after first carrying out a classical error correction step. Similarly, suppose we want to measure the logical operator Xj for the jth logical qubit. Then we prepare the jth logical qubit of the X ancilla state in the state |+> instead of |0>. Two very nice features of this way of measuring logical operators: 1. Because the ancillas are themselves error-correcting codes, we can correct some errors in the measurement procedure. That makes this protocol robust against noise. 2. In the course of measuring the logical operators, we also extract all the error syndromes. So this can just be substituted for the usual round of syndrome extraction. Measuring products of logical operators We are not limited to measuring single X’s or Z’s. Suppose I wish to measure ZiZj. Prepare logical qubits i and j of the Z ancilla in the state F+ . Then one can extract the product operator ZiZj in an exactly analogous way. One can similarly measure products of logical X operators. What about measuring logical Y operators? Or, more generally, arbitrary products of X’s and Z’s? This is done by preparing both ancillas in an entangled state. As mentioned above, this procedure has some robustness against noise. But if that is not sufficient, one can repeat the measurement a few times. While measuring logical operators is a useful thing to be able to do, we will use it as a building block for two key operations: Clifford gates, and logical teleportation. Clifford gates If we have a buffer qubit in addition to the qubit(s) we wish to act on, we can do arbitrary Clifford gates by measuring a sequence of logical operators. Suppose we have two qubits and a buffer qubit in the state y1 y2 03 . We can carry out a CNOT gate by measuring this sequence of logical operators: X2X3, Z1Z2, and X1. This leaves the three qubits in the state P + C12 ( y1 y2 ), where P is a (known) Pauli operator acting on the three logical qubits. This can either be corrected, or kept track of, along with the Pauli operators from the syndrome extraction. See, e.g., D. Gottesman, Caltech Ph.D. Thesis, 1997. We can similarly do a Hadamard gate by measuring ZX and XI, and a Phase gate by measuring XY and ZI. One can also do SWAPs, and reset the buffer qubit in the Z or X basis. Notice that the CNOT can be done by measuring pure X and pure Z operators, but the Hadamard and Phase gates require us to measure products of both. One Clifford gate, by this method, takes 2 or 3 “rounds” of the computation—more, if we need to repeat measurements. Some overhead may also be necessary if we must return the buffer qubit to its original state and location. Performing non-Clifford logical gates The above scheme allows us perform an arbitrary Clifford operation on the logical qubits within a single storage block. But it does not allow us to transfer logical qubits between blocks. We also need the ability to do at least one nonClifford gate to get universality. For both of these purposes we use logical teleportation. We can teleport a qubit to another storage block to allow CNOTs between blocks. For a non-Clifford gate, we teleport the qubit into a processor block, which allows a transversal gate outside the Clifford group. For example, the concatenated [[15,1,3]] truncated Reed-Muller code allows a transversal T gate. Logical teleportation We can teleport logical qubits between code blocks by measuring an appropriate set of logical operators fault-tolerantly. Suppose our [[n,k,d]] code has k pairs of logical operators (X1,Z1), (X2,Z2), …, (Xk,Zk), and we want to teleport into another code block with a pair of logical operators (X0,Z0). We will use logical qubit 1 as a buffer to hold half of an entangled pair, and logical qubits 2,…,k hold actual qubits to be stored. Suppose we wish to act on logical qubit 2. 1. Measure logical operators X0X1 and Z0Z1 to prepare a logical Bell state. 2. Measure logical operators X1X2 and Z1Z2 to teleport. 3. Do a transversal circuit on the second codeword to correct (if necessary) and apply the desired logical gate. 4. Measure logical operators X0X1 and Z0Z1 to teleport back. 5. Do a transversal circuit on the first codeword to correct (if necessary). Logical teleportation, as we see, can also be done by measuring logical operators. But now we must measure products of logical operators on different code blocks. We do this in exactly the same way as measuring any other logical operator, but now we must prepare the ancillas of two different code blocks in an entangled state. (We can also use this trick to directly do CNOTS between logical qubits of two different storage blocks.) Performance of the scheme How well does this scheme perform in practice? To see, we must choose particular codes for the storage and processor blocks, and do a combination of analysis and numerical simulation. For a first start, we are trying to estimate the code lifetimes for different choices of codes. Such an analysis should, in general, include all the different sources of noise in the system. These include: • Gate errors • Memory errors • Measurement errors • Ancilla preparation errors To simplify the analysis, we first transform all these noise sources into a single effective error process, which we can treat as acting only between rounds of the computation. Effective Errors data qubit E 11 E 12 E 13 ancilla E 21 E 22 E 23 Mz Mz ancilla E 31 E 32 E 33 Mx Mx Ei Ef The individual qubits in Steane extraction undergo transversal circuits like the one above. We have marked the locations where errors can occur. Crucially, we assume that these errors are uncorrelated between qubits. The effective error process acts before and after the circuit, which we treat as ideal. The errors, however, are now generally correlated in time. These correlations are potentially useful in diagnosing errors. Ancilla preparation errors can produce correlated errors. Gate errors... (a) X X X Mz Mz Mx Mx (b) Z Z Z Mz Mz Mx Mx Measurement errors... One-time effective error process If we ignore the time correlations, we can get an effective error process for a single time: Ef Em Ei ancilla Mz ancilla Mx We could make use of the correlations to do soft-decision decoding. But if the probability of an uncorrectable error at a single time is sufficiently low, that is already enough to get a lower bound on the code lifetime. From the above, we see that the effective error rate is going to be a multiple (but not necessarily a large multiple) of the largest basic error process (e.g., CNOT errors). Storage blocks To build storage blocks that compromise between high distance, high rate, and efficient decoding, we looked at concatenations of pairs of codes. Bottom level: [[23,1,7]] quantum Golay code. Top level: [[89,23,9]], [[127,57,11]] or [[255,143,15]] quantum BCH code. These combinations would give us storage blocks with parameters [[2047,23,63]], [[2921,57,77]], or [[5865,143,105]], respectively. At least one logical qubit in each block must be set aside to act as a buffer qubit. So these three combinations represent a logical qubit by approximately 93, 52, or 41 physical qubits. Very good rates! The Golay code can be decoded by the Kasami error-trapping algorithm, and the BCH codes by the Berlekamp-Massey algorithm. Estimated Performance [[2047,23,63]] (blue), [[2921,57,77]] (red), and [[5865,143,105]] (green). These curves are generated by Monte Carlo simulations and linear extrapolation. However, the extrapolation at low error rates can be backed up by an upper bound calculated purely from the distances of the code. For peff = 0.007, we get bounds on the error rates of approximately 10-16, 2.5x10-19, and 7x10-24. Processor blocks For the processor block, we looked at two and three levels of concatenation of the [[15,1,3]] code. With hard decision decoding the performance of this code is very poor at high error rates, largely because it is very limited in its ability to correct phase errors. Using Poulin’s soft decision decoding algorithm the performance markedly improved, even at effective error rates above 0.01. The upper bound used for the storage blocks is no use for soft decision decoding, and extrapolating from Monte Carlo is unlikely to be reliable in this case. But we were able to find rough bounds at moderately low error rates. D. Poulin, Phys. Rev. A 74, 052333 (2006). The [[15,1,3]] code at two (blue) and three (red) levels of concatenation. A combination of Monte Carlo simulations at higher errors, a rough bound at lower errors, and (not to be relied upon) extrapolation. At peff = 0.007 (all contributing error processes below 5x10-4), the block error rate is estimated to be roughly 2x10-12. This is certainly small enough to carry out highly nontrivial quantum computations. Ancilla preparation We have only begun to explore the problem of ancilla preparation for this approach. The first idea we are studying is a type of ancilla distillation: prepare M imperfect copies of the ancilla, then perform a transversal error correction step that yields a smaller fraction FM of high-quality ancillas. A very important requirement is that error correlations within each ancilla should be very small. (Across ancillas is not a problem.) How does this affect the resource requirements for this scheme? If the code is [[n,k,d]], and the distillation takes Td rounds with yield fraction F, then the physical qubits needed per logical qubit are increased by a factor of n æ 2Td ö +1÷. ç ø kè F If our available gates are local—as they almost certainly will be— then ancilla distribution will also increase the demand on the resources. That is yet a further question to explore. Threshold theorems? Does a scheme like this have a threshold? The short answer: I don’t know. • It certainly does not have a threshold with a fixed size of storage blocks. • To prove a threshold theorem, we would have to find a family of sets of code blocks where we could prove that lifetimes can be scaled up with problem size for sufficiently low probabilities. • This would be a very desirable result—but not for practical purposes a necessary one. A single storage block code could have a lifetime sufficiently long to do any conceivable quantum computation in practice. • We are using fault-tolerant methods to avoid propagating errors, etc., without doing computations of arbitrary size. Open questions and future work The number of open questions is large, including: does this really work? Here are some big ones: • Can we better codes for block storage? Perhaps quantum LDPC codes? Can we do the decoding efficiently in real time during a computation? It is encouraging that without trying very hard to find good goods, we got decent performance. • What are the resource requirements for the construction, verification, and distribution of the ancilla states? While these are all stabilizer states, and can in principle be prepared fault-tolerantly, the resources needed for them will probably exceed those for the codewords by a factor of “a lot.” Currently we are investigating (hopefully) efficient distillation algorithms for these ancillas. • If we include locality and communication costs, how much does this performance degrade? These costs are mainly in preparing and distributing the ancillas. • • Are there better (or additional) choices of codes for the processing blocks? We have looked a little at 3D color codes, but haven’t (so far) found an example that outperforms the concatenated [[15,1,3]] code. Can we find families of codes where a threshold theorem can be proven for this scheme? Thank you for your attention!