>> Krysta Svore: Good afternoon. Martin Suchara is a Postdoctoral Scholar at UC Berkeley. His research interests are in quantum computation, including quantum error correction and quantum algorithms. He has been working on the development of new quantum error correcting codes that have a high error correction threshold and can be efficiently decoded. There's a long bio. He's done a lot of good stuff, and we're thrilled to have him here. Martin. >> Martin Suchara: Thank you for the introduction. Today I am going to talk about quantifying the resources need to do real-world computations on the quantum computer using the topological and concatenated error correcting codes. Quantum error correction is, of course, a very important problem. You cannot build a quantum computer without it. And also it's a challenging problem because it's different than classical error correction where if you classical information you have just bits of zeros and ones, and the only error that [inaudible] is a bit flip. But quantum information continues, and we have a new range of partial errors but also phase flips and phase shifts that we need to error correct. The two main families of quantum error correcting codes that can address these areas are the concatenated codes which were developed starting with [inaudible] Peter Shor in 1995. And topological error correcting codes which started with [inaudible] Alexei Kitaev in 1997. Both of these code families have some advantages and disadvantages, and these are summarized in this table. The topological codes have much error correction threshold which means they can tolerate much higher error rates. And also computation with topological codes can be done using local operations. For concatenated codes if we, for example, wanted to do the two-qubit controlled-NOT operation we have to typically swap qubits before we can perform a local controlled-NOT operation. But for topological codes we can use braiding which can be done purely by using local operations and no swapping is needed. But topological codes, they have to solve a difficult problem to decode the errors. Typically a minimum rate matching problems is used to guess which error occurred given syndrome information. But for concatenated codes the decoding is much simpler. So the classical controller that controls these error correcting codes is a bit more involved for a topological codes. Also it is not clear which code is better in terms of number of qubit, number of gates and [inaudible] all the resources. And this is the focus of this talk. I'm going to address the issue quantifying the resources needed to do error correction with these two code families. And also I will look at some ideas I have about simplifying the decoding for the topological code. So this is the structure of the talk, first just some background about concatenated and topological codes. Then I will lay out the methodology that I used to quantify the resources needed to do error correction with the two code families. And finally I will speak about building a faster decoder for decoding errors in topological error correction. The stabilizer formula is very useful to characterize error correcting codes. The stabilizers are basically the syndromes we need to measure to diagnose errors. They are generated by the stabilizer group which the elements of this stabilizer group [inaudible]. And the stabilizers act trivially on the codespace. So if we have a state, psi, which is in the codespace then the stabilizer doesn't change the state which means we can safely measure these syndromes without affecting the encoded information. And we can learn about errors from these syndrome measurements. Concatenated codes. Perhaps the simplest example of a concatenated code is the Bacon-Shor code which is the quantum analog of the classical repetition code where we encode each bit three times and then we use a [inaudible] to decide whether we encoded zero or one. So to encode logical state zero in the Bacon-Shor code, you need to also protect against the phase flip, not only a bit flip but also a phase flip. So to encode zero we use the three plus state and to encode one we use three minus state initially. And then to protected these pluses and minuses against bit flips, we use the repetition code so zero would become triple zero and one would become triple one. So as the name suggests concatenated codes have concatenated structure, so we can repeat using smaller building blocks we can build the code from ground up. And up to a certain limit this will improve the error correcting properties of the code. Decoding of errors with concatenated codes is very simple. In the case of the Bacon-Shor code it's basically just simple [inaudible] as the stabilizers. What is often sited as advantage of these concatenated codes is transversal nature of many of the gates. So what does this mean? If a gate is transversal, our operation is transversal in the code then we simply need to apply the gate n times for each of the n building blocks in the code. So, for example, the controlled-NOT operation is transversal. So we have these two blocks of nine qubits each which represents two logical qubits. And to perform the controlled-NOT operation we need to do CNOTs between the corresponding pairs of qubits. But actually from the research estimation perspective, even though this gate is transversal it is fairly expensive because we cannot generally do controlled-NOT operations on a physical quantum computer between any pair of qubits. They have to be located at the same location; they have to neighboring qubits. So that means we have to use the [inaudible] operation to move the qubits around, and this is one reason why using concatenated codes is expensive. And then, of course, we also have gates that are not transversal. So it's possible to show that we need at least some non-transversal gates to do universal quantum computation with the concatenated codes. And so here is an example of the T gate which is non-transversal in the BaconShor code. So the gate uses an ancillary state that needs to be distilled. This is the distillation circuit which takes 15 copies of the ancillary state T plus, and it distills a single copy of the ancillary state with higher precision. And this distillation process needs to be repeated sufficient number of times to achieve the target fidelity of the ancillary state. And then the ancillary state is used in this circuit to apply the T gate. So the original state psi was here. Here was our ancilla and after this circuit is applied the T gate is applied to the state psi which appears here. Topological quantum error correcting codes have a very different structure. They consist of qubits that live in a regular grid. In this picture this is the surface code of Alexei Kitaev [inaudible]. The qubits live at the edges of the grid, and the stabilizers which are shown in this picture -- We have two types of stabilizers, the stars here and the [inaudible]. And by measuring these stabilizers we can error syndromes which gives us information about the errors that occurred. There are other examples of topological quantum error correcting codes. This is the 5-square code that I developed at IBM. Each code has a little bit different properties. One of the key advantages of this code is that in order to reconstruct the syndromes we only need to do twoqubit measurements. So the red objects in this figure at the stabilizers, but they decompose into smaller gauges. And by measuring gauges of rate two we can reconstruct the syndrome information. And this can be beneficial in situations where rate four entangling measurements are too expensive to perform on certain quantum technologies. In addition to locality, another advantage of the topological codes is the way they perform computation. Computation can be done by braiding; controlled-NOT operations can be done by braiding. So how does this work exactly? Well the Pauli operators logical X and logical Z are strings in the lattice. So here is the basic lattice which doesn't contain any defects which means that the syndrome measurements are enforced in the entire surface of the lattice. And this lattice in this state encodes two logical qubits. And this picture shows the logical X and Z operators corresponding to these two logical qubits. Now let's say we want to encode more qubits and we want to perform controlled-NOT operations in this system, so what do we have to do? Well, it turns out that if we puncture a hole in the surface, we will increase the number of logical qubits that are encoded. So first of all what do I mean by puncturing a hole? Do we have to change the physical layout of the system? And the answer is no. To puncture a hole, we simply stop enforcing the measurements of the stabilizers in the region where the hole appears. So this is the hole, and we don't measure any of the stabilizers in that region. And if we introduce two such holes, a pair of holes that will represent one logical qubit. And this is easy to see because the logical X and Z operations will be strings that connect the pairs of defect and strings that loop around the holes. Now which string is X and which string is Z depends on the exact location of the hole in the lattice because there are two kinds of holes: rough holes and smooth holes. But regardless, to perform controlled-NOT operations in this system we simply take one of the holes which corresponds to the controlled qubit of the controlled-NOT operation and we will move the hole to the target qubit then we braid the hole around the hole of the target and you move it back. Now this works if the control is smooth and target is a rough hole. And there is a simple conversion circuit that can address the problem when we have both control and target qubits represented by pairs of smooth holes. So this is very convenient because this operation can be done simply by changing the location of the syndrome measurements so there is no physical movement of information needed to the controlled-NOT operations. And how do we decode errors that occur in these topological codes? Well, when we do our syndrome measurements we will detect locations with non-trivial syndromes which are shown in red color in this figure. So these four syndromes were measured as non-trivial syndromes. And then it is easy to see that these syndromes occur at end points of strings of errors. So if we have a string of X errors then at the end point of the string of errors there is going to be a nontrivial syndrome. The same for Z errors, there is this star syndrome here and here which detects that there is this string of errors consisting of a single Z error here. So once we measure our syndromes, we can use minimum weight perfect matching to guess how these syndromes should be connected. We can used Edmond's Blossom Algorithm, for example, to do this. And then once we connect the pairs of matching syndromes we will correct -- So this is the Placket syndrome. This is also Placket syndrome so we know if these syndromes are matched we need to do X corrections, bit flips, on a string connecting the syndromes. So the red axis represents the correction that is being done and the black X marks, they show the actual error that occurred. So in this case after we apply error correction, we just apply this loop operator consisting of bit flips on this loop here. But it turns out this is fine because this loop is a loop that is in the stabilizer group so it doesn't affect the encoded state of the system. Now in reality this matching problem is actually a 3-dimensional problem because our syndrome measurement itself will be faulty. It needs to use a quantum circuit to measure the syndromes, and that's going to be prone to errors. So some of the syndromes are going to show as false even though there was no error that occurred and vice versa. So to address this we can use basically an analog of the 2-D problem by introducing the third dimension. And here in the third dimension each of these lines represents a single syndrome that is measured over and over again. And we will mark a red point if the syndrome measurement outcome is different than the measurement outcome in the previous round. And this way we will obtain a set of points in three dimensions that we can match. And now depending on the error rate of measurement versus the error rate in the memory, we can adjust the rates of these edges of the temporal dimension compared to the rate of the edges in the space dimension. And if we solve the minimum weight matching problem then there's high probability we will correct the errors that occurred. This problem is -- So this is a classical problem that the classic [inaudible] has to solve. But it turns out if we have millions of qubits and we need to repeat the syndrome measurements a certain number of times, this is going to be a large problem that is going to be expensive on a classical computer. The surface code, of course, exhibits threshold behavior so there is certain threshold. If the error rate is below the threshold, we can correct the errors that occur at this rate perfectly well as long as the code distance is big enough. So as long as we have enough physical qubits encoded the information, we can correct the errors. If we exceed the threshold, we cannot recover the information. So I did a simple simulation that estimates the threshold for various topological codes. I used C++, and I did Monte Carlo simulation. So I will inject a random error into the system according the probability model that is described here, and then I will try to decode the error, correct it. And I will record the frequency with which I can correct the errors at the given probability level after certain number of Monte Carlo repetitions. And this is the result that will come out of the simulation tool. So here I varied the error rate with which I injected errors into the system. And on the Y axis is the percentage of the time that I can successfully recover from the error. And you can see that as I increase the distance of the codes, as I increase the number of physical qubits that encode the information, the curve in this picture will get sharper and sharper. Which means if I am just below the thresholds, the percentage of failures is going to be very small. So almost always I will be able to recover. And if I'm above the threshold then the percentage of failures is large so I cannot recover. But if I choose... >> : When you say the percentage of failures, you mean the ones that were not recovered? >> Martin Suchara: Percentage of failures is the number of Monte Carlo iterations that resulted in failure so I could not recover the original encoded state. And we can see that we need to choose code of sufficient distance. If we want certain -- Let's say we want at least 5% probability of succeeding and our physical error rate is 3%, well then we need to choose a code distance at least 8 to guarantee this performance. So the key properties of topological codes. It is easy to increase and decrease code distance and that way influence the reliability of the resulting code. Local operations are sufficient to do controlled-NOT operations and basically computation with the codes, and they have error correction threshold that is significantly higher than threshold of concatenated codes. But the drawback is that the classical processing for error decoding is time consuming. So next I'm going to move to the actual research estimation with the topological and concatenated codes. First I'm going to give a brief overview of the properties of the quantum technologies that we consider as part of the quantum computer science project and which form the base of this resource estimation. Then I'm gong to show you the methodology that I used to quantify the resources needed to perform operations with the error corrections codes and the numerical results. So this is the structure of the research estimation task. We will consider a certain set of quantum algorithms, and for each of these algorithms we will express the number of logical qubits we need to perform the task, the number of logical gates to do the computation and also information about parallelization factor and the length of the two-qubit operations in the system. For the quantum technologies we will consider the gate times and fidelities and the memory error rate on qubits that sit idle in the system. And we will also consider properties of the four basic families of error correcting codes. They can show [inaudible] C4/C6 and the surface code. And all this information feeds into our resource estimation tool that produces a qubit layout and it estimates the circuit delay, gate count and the fidelity of the entire computation. Yes? >> : You say number of logical qubits, but that number probably depends on the infiltrate. >> Martin Suchara: Yes. So we will choose specific problem of certain size and then we will express this. So right now we finished phase one of the QCS project where the problem sizes were hard-coded. But now we are moving to phase two and we are going to parameterize the problem sizes so that we can adjust a simple parameter which describes the size of the problem and that way we can obtain gate counts. For example, let's say we want to run a quantum computation for one year. What size of problem can we solve? So we can solve the inverse problem if we parameterize. And this is what we are working on right now. >> : Like for a given size you take the worst [inaudible] that you can get? >> Martin Suchara: For a given instance of the problem, are we considering... >> : [Inaudible] for given size of instance, that's what you said [inaudible]. You said [inaudible] the size [inaudible]. >> Martin Suchara: The parameter would be the -- So let's say we want to solve some graph problem and you want to -- Let's say the trianglefinding problem and you want to find if there is a triangle inside of a certain graph then the parameter would be the number of vertices of the graph. So of course the more vertices the harder the problem is going to be and the higher the gate count is going to be. So this is a collaborative project. I have been coordinating the work on the resource estimate on behalf of the USC team. This is -- I have a project, the resource estimation is currently being done by four teams independently. And I have a number of collaborators at a number of universities who I owe that great work to analyze the properties of the algorithms and quantum technologies. In our project we are studying four families of error correcting codes, Bacon-Shor, [inaudible], C4/C6 and surface codes. We have seven algorithms which all of these algorithms have very different qualitative properties; they use different quantum primitives. Some of them use quantum simulation. Some of them use quantum random walk, [inaudible] transform. So it's a very diverse group of algorithms. And we are considering also six technologies and six quantum control techniques that enable us to reduce the errors in these technologies. And this talk rather than presenting the cross-product of these results which we had to obtain in the project I'm just going to show the highlights of our results. So for the quantum technologies our goal is to obtain, for a range of realistic quantum technologies to obtain the gate times and the gate errors to perform physical quantum gates in the system. And in the QCS program we studies the effects of quantum control protocols on the gate errors basically which control techniques are effective at reducing the gate errors and which are not. Regarding our methodology, we used Monte Carlo simulations to insert random noise and study the effect of control protocols on this noise. We used optimization tools to optimize choice of control parameters. And also we used gate constructions because not all quantum technologies support all elementary quantum gates. And even if they do, sometimes like smart decomposition of the gates from a different set of gates can result in a smaller error. So this is on technique we used as well. And this technique was actually very helpful in reducing some of the errors for some technologies. The two of our favorite technologies are superconducting qubits and ion traps. superconducting qubits have errors that are [inaudible] Markovian in noise and, therefore, it is difficult to reduce these errors by using controlled techniques. The error of the gate is basically proportional to the duration of the gate, and if we use sophisticated control techniques that increase the duration of the gate that will actually lead to increase of the error. So primitive control is the most successful with superconducting qubits. And the errors would be roughly ten to the minus five for a quantum gate on average. Ion traps have extremely low error rates except the measurement error. But we were able to use a circuit decomposition that uses three measurements to bring down the error of the measurement down to roughly ten to the minus nine which is in line with the low errors of the error gates for ion traps. So these are our two favorite technologies because they have low errors and they can be used both with the topological codes and concatenated codes. Neutral atoms also work reasonably well - oh well, sort of well with topological codes but the errors there are about ten to the minus three so that's just below the threshold for topological codes but they cannot be used as concatenated codes. And for quantum dots and photonics our results currently do not allow us to use these as any of the existing quantum error correcting codes. So here is an overview of the numbers that feed into my research estimation tool. So here are the main -- I just selected some of the technologies, superconducting qubit, ion traps and neutral atoms, and I'm showing the average gate time for the average operation. Our resource estimation tool actually takes the gate count for all the individual gates. So the information that feeds into the tool is more detailed than what this table shows. So this is the average gate time. Then we have gate error for each gate operation. And then we have memory error, unit of time per nanosecond. So the key observation here is that ion traps have the lowest error rates by far but also the gate times of ion traps they are not very short. They are roughly 3 orders of magnitude slower than the gate times for superconductors. So superconductors are faster but they are a little bit more error prone. The errors would be ten to the minus five. So it is certainly interesting to compare these two technologies because just by looking at this table it's not clear which one is better. And then we have neutral atoms which they are both slow and error prone, so they are certainly not going to be [inaudible] among these. >> : [Inaudible] the memory error rate is zero. >> Martin Suchara: That is right but I'm not convinced that -- This is unphysical. I think this is artifact of our model. So to do the estimation I picked three quantum algorithms. I picked algorithms that will give us reasonable running time. So in our project we are considering algorithms that have just quadratic speedups also, and these result in huge gate counts. We're considering running these algorithms on problem instances that are very ambitions. Again, this leads to big gate counts. So here I'm trying to handpick the algorithms and problem instances that will result in results that are humanreadable. So if we actually built the quantum computer, we will be able to see the result of the computation in a couple of months. So one such problem is estimating the ground state for a molecule. I picked a molecule that is not too difficult to analyze. So I picked this glycine molecule, small organic molecule, which only requires 50 basis functions to describe the ground state. And I calculated -- I decide to calculate the ground state in a fairly crude way with only five bits of accuracy. So this is the first problem. >> : Which basis was that? Which of the quantum [inaudible] basis did you use? >> Martin Suchara: I'm not sure about this. >> : STO3G? 4G? P321? Don't remember? >> Martin Suchara: I'm not sure. >> : Okay. How many qubits did it require? [Inaudible]. >> Martin Suchara: Fifty qubits for the basis and then ten qubits additionally. The second problem is the binary welded tree algorithm. So the problem formulation is as follows: I will take two binary trees and weld them together in the middle. So here the top part is one binary tree. Here at the bottom is the second tree. The two root vertices are special. One of them is marked as the start vertex and the other node -- the other root is the finish vertex. And the goal is to start at the start vertex and find the finish vertex by relying on an oracle. And the oracle will give us information about the graph. If we supply the oracle with a label, a name of a vertex in the tree it will give us the names of the neighboring vertices. And each vertex has actually exponential number of labels assigned to it and the oracle will return just one of the labels. And we have to decide once we query the oracle at a certain vertex, we have to decide which of those three neighbors we will move to next. And this way we have to find the finish vertex. And using classical computation it was shown that it was impossible to find the finish vertex in sub-exponential time. But there is a [inaudible] quantum algorithm that uses the continuous quantum random walk. And we chose this problem for tree depth 300 which is an instant size that would be very difficult to do on classical computer. >> : [Inaudible] my question. >> Martin Suchara: Yes? >> : So tree depth is n equals 300, tree depth? >> Martin Suchara: This is depth. So the... >> : So there's a lot of nodes on the tree? >> Martin Suchara: The number of nodes is going to be huge. There's going to be like -- Right. Exponential. >> : Binary tree two to the... >> : [Inaudible]. >> : Well, but it gets bigger and gets smaller so it's half. >> : These are perfect trees? >> Martin Suchara: Yes, well they are not exactly perfect. They are just almost perfect. I think we can assume that they are perfect for this purpose. They are not perfect in the middle because it turns out there is this strange feature of the problem. If we have exact binary trees and we weld them in the middle, we will have vertices of degree two in the middle. And then there exists a classical algorithm that can exploit this information. It knows, basically, when it hits the middle and it becomes easy to guess. At which point in the pass we make the wrong turn. So they are scrambled a little bit but they are basically binary trees. >> : So since you scramble it will be now with the remaining passible trees. >> : [Inaudible] pass. >> : [Inaudible] you mean the center is a random [inaudible]? >> Martin Suchara: Right. >> : Yes. >> : Yes. >> : So now -- So in addition to number of nodes there is a [inaudible] particular tree that you used, pair of trees? >> : So what I did the first time around, how [inaudible] number which depends only upon the size of the problem and it also depends on [inaudible]... >> : Permutation of the joining, of the [inaudible]... >> : Right. >> : Is this problem available on the public literature? >> Martin Suchara: It is. There are at least two papers I'm aware of that studied this problem.... >> : [Inaudible] look it up. >> : Yeah.... >> : [Inaudible] binary welded tree benchmark or something. >> : Oh, okay. >> : And find out enough detail to --. >> Martin Suchara: And then the third algorithm I studied is the triangle-finding problem. So, again, it's a graph problem where the graph has n vertices, and our task is to decide of there is a triangle in this graph. So the instance of the problem is set up to be this [inaudible] instance where the graph is dense but there are very few triangles. Actually there is going to be either one triangle or no triangle at all. And the task of the algorithm is to decide if we are given instance of a graph that it has one triangle or no triangle. So the triangle here is in the middle and then we have these components which contain -- Each of them has n over six nodes. And these edges mean that all the vertices and the individuals components are -- all the pairs of vertices are connected by edges. So there is no triangle in this region but a single triangle in here. So we consider it a graph with two to the fifteen nodes. >> : So what's the challenge here? What's the classical way of solving it if there is one? >> Martin Suchara: Well, I guess the classical way of solving it -- Oh well there is an oracle also that tells you how the structure of the graph looks like. So I guess you have query it. You have to tell is the name of the vertex again and it tells you which are the neighbors. And then you have to scramble this information.... >> : What size graph has been done classically, I think it's better to ask? >> Martin Suchara: I believe that for this problem 32,000 nodes is actually pushing the boundary because this is the original parameter given by [inaudible] and from phase one and we tried to -- or they tried to come up with parameters that are pushing the boundary. Yes. >> : [Inaudible] looking for like quadratic [inaudible] speedup here? Or what is the [inaudible]? >> Martin Suchara: I believe the quantum log here gives exponential speedup too. >> : Exponential speedup. Probably? >> Martin Suchara: I'm not sure about provably. I know that for welded tree it's provable. For the triangle problem I am not willing to bet. >> : It's binomial number of triples of nodes. You can go through all triples of nodes just exhaust [inaudible]. >> Martin Suchara: Right, but I believe the oracle will -- All right. [ Silence ] >> Martin Suchara: I will have to check up the speedup claim for this algorithm. So for the welded tree it's exponential, and for tree depth 300 this would not be solvable classically. For the ground state estimation this molecular -- So we originally studied molecular levels more complicated but actually the gate counts coming out of that are huge. So I decided to analyze a simple molecule. And for this molecule this would be, I believe, a classically solvable problem. So this table summarizes the gate counts that we obtained. So again I picked the welded tree and triangle finding problem. They use the quantum random walk which is fairly efficient, so the gate counts coming out are not too bad. And also for ground state estimation we are using a simple molecule. So we have ground state estimation ten to the twelfth gate. The number of qubits needed to do the calculation: it's sixty. These are the logical qubits. And then we have parallelization factors which tell us on average how many of the gates can be done in parallel in the circuits. In my opening there is scope for improvement in these parallelization factors. We don't know if we can lay out the circuits in a better way that is more parallel. And for each of the algorithms we not only have the total gate count but we also have a breakdown by gate type. So this is the ground state estimation algorithm. So we know how many state preparations we need, how many [inaudible], controlled-NOT, S and Z gate measurements. And this is the information that feeds into... >> : [Inaudible] go back? >> Martin Suchara: Yes. >> : You're able to do ground state without any rotations? [ Silence ] Who derived the circuit? Maybe that's a better way to put it [inaudible]. Do you know where it came from...? >> Martin Suchara: Oh, so I think actually the Z rotations are [inaudible] rotations. I think that's misleading. >> : Okay. Okay. Yeah, it's arbitrary and Z, that's fine. There are some Z's really. >> Martin Suchara: Yes. That's -- That's a misleading choice of --. >> : Okay, that's fair. I was like you can do that without rotations. Okay. >> Martin Suchara: So for the actual analysis of the error correcting codes, so what is the overhead? So, so far we only saw number with logical qubits and logical operations. What is the number of physical operations we need to do? So we looked separately at the three concatenated codes and at the surface code. So for both code families we assumed a tiled layout. So we have a 2-D structure that contains the physical qubit. And this structure is divided into tiles where each tile is in charge of one logical qubit. The tile has enough space to include ancillas. So if we need to error correct the information in that logical qubit, we can do so within the tile. And the third dimension in this space is reserved for a classical control. So how exactly looks the layout of the qubits inside of the tile? We had to determine that separately for each of the error correcting codes because they encode information differently, operations are done differently. And the number of qubits is going to differ inside of each tile too. So here is the tile structure for the [inaudible] code at the second level of concatenation. So the first level tile so it contains six by eight physical qubits. And then at the second level we use six by eight of these level one blocks to construct the tile. And we have to choose a sufficient level of concatenation to guarantee high probability of success of the calculations. So I chose a cut of 50% success probability for the calculation and I used the equation, I believe it was originally shown by [inaudible] in a paper that shows how many concatenation levels we need. And from there we can calculate the physical number of qubits. So what is going on inside of the tile during computation? Well the size of the tile depends on the code. There is existing literature that shows us the location, the optimal location of the qubits inside of the tile and the sequence of operations we need to do to perform error correction as well as logical operations. So existing literature tells us what to do for Steane and Bacon-Shor code. But for the C4/C6 code we had to come up with our tile design. So here is a specific example from the paper of Svore that shows the operations inside of the tile for the Steane code. So the tile size is six by eight. And the Steane code, it uses seven physical qubits to encode one logical qubit. So these red qubits here, the red locations, they're the locations where the data lives. And then here on the interior of the tile we have some ancillas that are needed in order to do syndrome extraction and error correction for the Steane code. So this determines the location of the data qubits, the ancillas and also a sequence of operations. So the paper of Svore actually shows the exact gate sequence of all the gates that need to be performed to do error correction as well as all other operations. This is a snapshot of the tile at a particular moment in time doing the error correction. And these errors they represent physical gates that being done. So these are SWAP gates and controlled-NOT operations. The layout has to be optimized to minimize the number of operations that need to be done to ensure good reliability of the circuit and to minimize the amount of movement. So these SWAPs are done because controlled-NOT operations can be only done on qubits that are next to each other. So this is actually a very expensive thing for the concatenated codes, the movement. Our tools: we used recursive equations to express the gate counts for the desired level of concatenation. So we count the number of elementary gates and the time we need to do these operations taking parallelism into account. We also estimated the additional space just needed to do ancilla state generation at the desired level of fidelity. We used similar methodology for the topological codes. So as I said earlier in topological codes a pair holes represents a logical qubit. So we assumed this layout. We have to ensure sufficient spacing of the holes because loops that connect the holes are logical operators. If we have holes that are too close to each other there is going to be a lowrate logical operator and the code is going to be error prone. So we calculated a code distance that is sufficient to, again, guarantee that the calculation succeeds with high probability. And we also took into account the movement of the holes during braiding. So we left enough space in between the holes so that another hole can be braided without effecting negatively the error properties of the code. So to obtain the physical qubit count, that is very simple. We just obtain the code distance and then from code distance we can get size of one tile and we multiply by number of logical qubits and that give us number of physical qubits. For gate counts we started by first calculating the exact running time of the entire computation, and then from there we were able to calculate the number of gates that are needed to do the error correction which is done all the time on essentially all the qubits. So this is the major component of the total gate count. And then finally we added the small number of additional gates needed to do gates such as measurements, [inaudible] and so on. So numeric results. These are the results for superconducting qubits which -- So if you recall this is the technology that has very fast gate times, on average gate only takes 25 nanoseconds. And as far as errors goes, this is somewhere in the middle of the road among the technologies I showed you. It's ten to the minus five error pair logical gate for the first gate. And this table shows the resources for the three algorithms for the instance sizes we discussed. So you can see that the surface code will result in computation times from months to years for these instances. Here are the gate counts. And this is the physical qubit count. So we need a few million qubits for the computation. The Bacon-Shor code which is representative of the concatenated codes requires much higher computation time. And I think the reason is we need several levels of concatenation, three or four levels of concatenation to address the gate errors. And that will translate in a huge gate count and then huge time needed to do the computation. And also the number of physical qubits is going to be high because of all this concatenation. We need to store the information somewhere. Yes? >> : How sensitive is this number? Like if you pushed it ten to the minus six, would everything lose like three orders of magnitude in the results? >> Martin Suchara:That is a good question. It is sensitive to it... >> : [Inaudible] state the question. >> Martin Suchara: Oh, yes. The question is sensitivity to the gate errors. If we change the gate error will the resource requirement change a lot? And the answer is yes. It is sensitive because of concatenated codes for example there is this very sharp transition. If you go from three concatenations to four concatenations the resource requirements will blow up by three, four orders of magnitude perhaps. And that's by changing a single number. If you adjusted the boundary then this can cost you a lot. But it's very interesting, I'm going to discuss this in the next slide, that... >> : I have a question about [inaudible].... >> Martin Suchara: Okay. >> : So in the ground state estimation we sort of agreed here that you need to have various [inaudible] as the gates, right? >> Martin Suchara: Yes. >> : Did you count each one as simple gate or did you estimate to the kind of [inaudible] of decomposition into the [inaudible]? In other words, does it represent a T count involved in implementation of a simple [inaudible]? >> : Your Z gate's one gate or were they a hundred gates or... >> : Right. >> : ...were they [inaudible] equivalent? >> Martin Suchara: [Inaudible] do the decomposition. >> : Of the rotation? >> : You did. >> : Yes >> : Oh, okay. Okay. So that is the real number of gates? >> Martin Suchara: Yes. Yes. >> : Okay. >> : Right. So what do -- Can I ask what [inaudible]? Yeah, I have a sinister... >> : Yeah, of course. >> : ...reason to ask this question. We kind of improved [inaudible] by three orders of magnitude recently. And I was wondering how much it would have impacted the ground state [inaudible]...? >> Martin Suchara: Possibly. I'm sure there could be improvements.... >> : [Inaudible] a big count? >> : No, no, but he had Z up there was the full rotation gate. So when he goes into here he has to multiply it by ten to the second, ten to the third to get the accuracy [inaudible], even five-bit accuracy. >> : So it didn't actually have the [inaudible]? >> : Right. So the point is you could probably drop three digits, the ten to the twenty-second might drop down to ten to the nineteenth instead if you didn't have to do the -- if you could get three orders of magnitude better on the rotations. Is that what you're getting at? >> : Yes. >> : Yeah. >> Martin Suchara: Yes, I would be certainly interested in learning new techniques to improve the decompositions. >> : It's still up at ten to the nineteenth. I mean, I'd still -- You know, it's a simple molecule. It's still a hell of a lot of gates. >> Martin Suchara: Right. But we have lots of problem instances that had much higher numbers. I am just showing numbers that are human readable. >> : Again along these lines: is it clear that if you -- So you said you took 50% success rate for the complete algorithm. Is it clear that if you drop that to something like ten to the minus six and you just repeated the experience like a million times, is it clear that it wouldn't be better than waiting for a thousand years for one [inaudible] 50% success probability? >> Martin Suchara: That's a great question. We haven't done this calculation and it's possible that we can obtain lower running time by targeting lower fidelity of the algorithm and repeating it a few times. Also, there are basically many optimizations that could be done and we haven't taken into account. One such other optimization is distillation of the ancillas. Perhaps if you distil them with lower -- your target fidelity for the ancillas is lower, perhaps that will introduce inaccuracies but since the distillation, again, is this concatenated process if you use one [inaudible] concatenation in the distillation process, that saves you a lot of time and a lot of gates. So it's possible that these tradeoffs can improve the resource estimates significantly. >> : So in that earlier slide did you have the logical gates and so on? I mean how many -- Suppose there was no errors. >> Martin Suchara: Was it this one? >> : The one -- Yeah, that one. >> : Yeah. >> : That Z is really rotated Z. It must've expanded out [inaudible]... >> : I understand. So there might be some big number in there. >> : Right. That gets much bigger. >> Martin Suchara: Yes. >> : All the rest of them are sort of what they are which says you something [inaudible] to the fourteenth or something plus whatever the Z count -- whatever the T count in Z is and -- ? >> : Yep. >> : So they each count and so -- Yeah. >> Martin Suchara: So the next question that I wanted to ask was if there is some [inaudible] -- So we saw that the topological codes outperform the concatenated codes for the superconductors. So the next question I wanted to ask is there some [inaudible] where actually the surface codes are not performing as well? And the answer is yes. So I considered these three technologies. We have neutral atoms which have huge error rates on the gates. And then we have superconductors which have lower errors and then we have ion traps which have even lower errors. And, okay, so the result here is very interesting. For the high errors of neutral atoms you can only use the surface code because the surface code meets the threshold, ten to the minus three, but the concatenated codes they just cannot deliver. And [inaudible] where we had -- Oh, and the time to do the calculation is huge because the gate time for neutral atoms is three orders of magnitude longer, slower than for superconductors. Superconductors we already know that the surface code will win. But then if we decrease the error rate even further for the ion traps the interesting observation is that the concatenated codes actually do better than the surface code. And the reason is these very small error rates, we only need one level of concatenation. So error correction on the concatenated code is super cheap. But the surface code, the duration of the computation is almost independent of the code distance because all the operations in the surface code are done in parallel. All the syndrome measurements are parallelized. All the operations, the braiding, are highly parallel. So what actually determines the running time for the surface code is the duration of the gate. So some key observations: the surface codes are better in most regimes unless you're dealing with very low error rates. CNOT gates are the dominant gates for logical circuits, but as far as physical gates CNOT was the most frequently used physical gate for the topological codes. And SWAP is the most frequent gate for concatenated codes. And the reason is that for concatenated codes we need to use SWAPs to move information around so that controlled-NOT operations can be done locally. Various surface codes do not need the SWAPing. So finally I will just use the last five minutes to describe some thoughts about building a faster decoder for topological codes. So, so far your only concern was the quantum resources, but we have to keep in mind that for the surface code we will also need a classical controller that decodes the errors that occur. So we know that we need to be able to solve a problem that has millions of physical qubits. Also we know that syndrome measurements will be inaccurate so we will need to decode errors after several rounds of error measurements, syndrome measurements. And we know that we like technologies with low gate times because the surface code works well with technologies with low gate times such as superconductors. So we would really love to have a decoder that works in real time. And this is work that I'd like to do in the coming months. I started looking at the decoding problem. To decode errors for the surface code we need to solve the minimum matching problem to match pairs of syndromes. And one key observation - So even though this problem is [inaudible] solvable -- We would typically use the Edmond's Matching Algorithm to solve it -- it is going to be too slow for the problem sizes here considering. So one observation that is due to Fowler is that we don't need to solve the most general minimum weight matching problem but we can prune certain edges from consideration. So here is an example. We have a vertex V and we want to match it to some other vertex. And Fowler's observation tells us that points in space cast shadows, and if there is here point number one, it will cast a shadow here. Point three will be shadowed by it. So vertex V, you only need to consider the edge from V to one but we can ignore the edge from V to vertex number three. And the reason being you can show that, assuming a minimum weight matching would use the edge from V to point number three, you could find an alternate matching that would have lower rate. So now this is a two-dimensional picture and, of course, we need to solve the problem in three dimensions. It turns out that in two dimensions there is a very simple linear time algorithm that can prune the edges, prune the shadow edges. But I did not manage to find a counterpart in three dimensions. In fact, I believe it's unlikely that such an algorithm exists. But I found a heuristic that allows us to prune the edges in linear time with very favorable results. And the heuristic works as follows: so we are given the vertex V and we want to find the candidate edges. So we will look in the four geographic dimensions for the closest point, and then you will create a bounding box in these two dimensions. And we know that any point that vertex V will connect to is going to be inside of this bounding box. And then we look at points in the third dimension perpendicular to this screen, and we find the closest such point in each direction. And we connect vertex V to these points. So I looked at the resulting average degree of the vertex in the three-dimensional graph that I obtain by simulating the surface code. And so here is the number of qubits in the surface code, so I went up from a hundred qubits to one million qubits. And I looked at the average degree of a vertex after the pruning. And it seems that the number approaches about 180. So we have -- In the limits we will have a constant number of edges per vertex. And I tried to solve the minimum weight matching problem on just my laptop computer and see how long it takes to do the matching. And these are the results. And you can see that scaling is also linear. So for one million qubits it takes a bit over a minute to construct the graph, generate the errors and prune the edges. And then it takes about seven or eight minutes to do the matching itself. Yes, there is a question. >> : You've listed the number of qubits but what is the error rate by which you're running your simulation because that has a big effect on the number of vertices you need to pass to the graph? >> Martin Suchara: I was running fairly close to the threshold -- at about [inaudible] of the threshold. If the error rate was much lower then I think the result would be -- we could decode errors for higher number of qubits because there would be just fewer syndromes. But probably I think the statistical distribution of the points in the space would be very similar. So I think the degree of the node would actually be the same. Yes, another question. >> : Performance of the decoder because now you're using this heuristic to parallelize your algorithm, if I understand you correctly? >> Martin Suchara: So this is not parallel yet. This is... >> : But it can be. >> Martin Suchara: But it can be. And I'd like to look at parallelization.... >> : So [inaudible] possible using this [inaudible], right? >> Martin Suchara: So it is certainly possible. I have actually parallel implementation. I think I have it in the next slide. I have a parallel implementation for the pruning which is easier to do. I mean the pruning is just a very simple heuristic. I don't have parallelization for the matching itself which uses the Edmond's Algorithm. It uses prime and duel updates. >> : [Inaudible] which does this trick of creating these bumps around the vertices and having these smaller graphs. Does that affect the performance of [inaudible]? >> Martin Suchara: Oh yes. It's very significant. I would not be able to do -- I would probably have to stop at somewhere between a thousand and ten thousand qubits if I didn't do the pruning. >> : Sorry, I didn't mean in real time. I meant in probability of decoding errors. >> Martin Suchara: It doesn't affect the probability of errors because we are still solving the exact same problem. We can show that we only prune edges that must not be in the minimum rate matching. So in conclusion, I did some work on topological quantum error correction. I hope to develop two new quantum error correcting codes in this space, the five-squares code and the triangle code. And in the past year I worked on the quantum computer science project of IARPA to estimate the resources required to run a variety of algorithms on realistic quantum machines using four families of quantum error correcting codes. And currently I'm looking at the problem of parallelizing a decoder for topological quantum error correcting codes. Thank you. [ Audience applause ] >> Krysta Svore: Questions? >> Martin Suchara: Yes. >> : I don't know if there's a simple answer to this. If not, we can take this offline. But what about the bottom and top boundaries when you decode a surface code with all the measurements? Can you take into account perforation errors? And what about the last layers of measurements? Do you have to kind of -- So I guess the point is that you measure many times to be more sure about the syndrome outcome. But the last measurements you did might have errors which you're not sure are good, so you want to decode what has been the best in some sense. So how do you deal with this issue? >> Martin Suchara: So I think there are two or perhaps three schools of thought how to deal with this. One of them says regardless of how you solve this problem you could just do a dummy round of perfect measurements and then run your algorithm on this instance. And then with the knowledge that in reality you would do something else such as do the error correction maybe ten rounds behind schedule, behind the actual measurements. The second school of thought would be obviously to simulate exactly what would be going on in the real system. Which is perhaps this: you would do the decoding with some lag. And the third school of thought is to use a periodic boundary condition in the time dimension. So you would connect, essentially, points to the boundary. Yes, [inaudible]. >> : So just a comment. I think you collaborated with all three of us on this project. But... >> Martin Suchara: Yes. >> : On the surface code estimate, you had some high numbers. And I just want to say that for some of the distillation procedures we assumed a very conservative model of how we were distilling magic states. And so some of those numbers I think you showed could be reduced fairly significantly with a penalty in area [inaudible]... >> Martin Suchara: I changed that already actually. >> : Oh, okay. >> Martin Suchara: Right. So we had earlier results that were more pessimistic because we used a conservative assumption that only a single CNOT would be done in the surface code at any given time. But the numbers I showed right now, I tried to parallelize all the CNOTs that I could including in the state distillation. >> : Yeah, but in our state distillation we... >> Martin Suchara: And, yes, we need a large number of qubits than originally predicted. That is correct. >> Krysta Svore: Thank you again. [ Audience applause ]