>> Krysta Svore: Okay so welcome everyone. Today we have Nathan Wiebe visiting the [indiscernible] group. He is a candidate for a researcher position in [indiscernible]. He is currently a post-doctoral fellow at the Institute for Quantum Computing at the University of Waterloo. Today he is here to talk to us about progress towards a killer application for quantum computation. So, Nathan thank you for coming. >> Nathan Wiebe: Great, thank you very much for the introduction Krysta. So as Krysta said I am going to be talking to you today about the quest to find a killer application for quantum computation. The idea for this talk came from a question that was actually asked to me by two different colleagues at two different conferences. This question is, “If I was given a quantum computer with 100 “qubits” in it what would I use it for”? And this question got me thinking and made me wonder whether we have actually found something that realistically is a killer application for quantum computation yet and if not what would such an application ultimately look like? Now the notion of what a killer application is to some extent sociological. It depends on what problems you think are interesting or what problems you think other people find interesting. So in order to preface the opinions that are contained in this presentation I really think its necessary talking about the sorts of research problems that I personally find interesting. Research touches a bunch of different areas involving quantum simulation, adiabatic quantum computation, quantum algorithms, numerical analysis, and foundations of physics and Bayesian inference. So it might come as a bit of a surprise given this, that to me, my vision of a killer application looks something like this. This is Visicalc, a program that is widely regarded to be the first killer application for home computers. This simple application single handedly justified the purchase for many people of a very expensive home computer in 1982. So from my perspective something it ought to call itself a killer application for a quantum computer should spark people’s imaginations, at least as much as Visicalc did. If it doesn’t then I would not consider that to be a killer application. So in this talk I am really going to put candidates forward for potential killer applications and then discuss whether or not I feel that these are adequate or close to being killer applications. And these two candidates that I am putting forward are: Quantum Simulation and Quantum Data Fitting. The outline of my presentation will involve an introduction to quantum simulation and then I am going to present a new method based on nonunitary simulation that was developed by myself and Andrew Childs that improves upon existing methods used in, for example, quantum chemistry simulations. Then I am going to discuss data fitting. In particular I am going to give an introduction to the linear systems algorithm on which the data fitting work is based. Then I am going to present the algorithm, after which I will conclude and discuss whether either of these actually is a killer application and if not how close we are. Quantum computation really began with thoughts from Richard Feynman and Yuri Manin that suggested that quantum information could be stored and manipulated in quantum systems themselves. Now this actually doesn’t sound on the surface like all that deep of an insight after all. Morris law has caused us to miniaturize to get components smaller and smaller, this does sound like it may just be a replication of the same theme, just smaller. But, actually it’s a fundamentally different idea. The reason why is because of the fact that the information has to be stored in a manifestly quantum fashion. That is in a way that defies a classical description. So the way that this is generally done in quantum computation is that we have bits that are then generalized into quantum bits or qubits. And unlike classical bits these quantum bits can be an arbitrary super position of zero and one. So the difference between this and maybe a full analog computation, well one of the differences at least, is the fact that when you measure this you get a discrete outcome. So every time you measure the quantum bit it probabilistically will either be 0 or 1 and the probabilities depend on the amplitudes of A and B. You could of course repeat the same idea with more quantum bits. So with 2 quantum bits your system can now be in an arbitrary superposition of 4 different values. And obviously if you imagined that if you had 3 qubits then it can be in an arbitrary superposition of any of 8 different combinations of 0 and 1. So that’s how this works. And of course a quantum computer also has to be able to transform these quantum bit strings, at least in principal, into any other possible quantum state within the quantum computer. So the question is: were does quantum computing potentially get its advantages over classical computation”? And there are many different candidates that one could use for this. But, to some extent one of the biggest things is this exponential growth of the amount of information that’s required to represent the quantum state. To give you an idea about how these exponentials ends up scaling I have given the following diagram. Imagine that you had a single quantum bit and you could write down all the information you needed in order to represent that quantum bit within some fixed error tolerance on a single grain of sand. Then of course the relevant unit is, “Well how many qubits would you need in order to make a pile of sand the size of a Tyrannosaurus Rex”? And that turns out to be 48 qubits. If you want to look at a quantum computer that has twice as many qubits, 96 then the pile of sand you would need to represent that quantum state would have a volume of about a small moon. Going up, again a factor of 2, that pile of sand would have to have the size of a small planetary nebula. So, obviously this problem that I began with is this question of what I would do with a 100 qubit quantum computer. This is not a trivial computational device. This device would require a lot of classical information in order to specify what’s going on inside the system. So you ought to imagine that quantum computation should be useful for a host of different problems. And a number of algorithms are known for which speeds up over the best known or best possible classical algorithms exist. And of these I am going to focus only on those that offer superpolynomial or exponential speedups. The reason is that obviously we really want something that is much better than a super computer. So of these there are a number of examples. And the two that I consider most promising are of course quantum simulation and linear systems algorithms. And I am going to be discussing those in detail later on. So let’s start with quantum simulation and the quantum simulation work that Andrew and I have done. So the idea of quantum simulation is actually really straight forward at its core. What you have is you have a quantum computer which intrinsically is a quantum system. And you have the ability to perform arbitrary transformations on the quantum state inside that quantum system. So in principal other physical systems are themselves quantum systems. So you would imagine the transformations that naturally, say a chemical system is undergoing, could be performed on a quantum computer on the logical level. Similarly the quantum transformations that occur in condensed matter systems ought to be emulated by a quantum computer of an appropriate size. And this is the idea. You basically use the fact that a quantum computer is a universal device to emulate the natural quantum evolutions that exist in other systems. And you can actually do this efficiently whereas on classical computers there is no known way in general to simulate arbitrary quantum systems efficiently. So you can do some kind of amazing things if you were able to get 100 qubit scaled quantum computer. You would be able to simulate reaction dynamics, and learn spectral properties of chemicals without ever having to synthesize them. You could use these devices in order to investigate models for exotic superconductivity or look for quantum phase transitions in certain condensed matter systems. There are many different possible applications for hard problems scientifically that quantum simulation could be used to address right now and important ones too. >> Will there be some questions that you cannot answer in a regular ordinary way of [indiscernible]? >> Nathan Wiebe: So what do you mean, sorry? >> You said you can answer some questions about chemicals without synthesizing them. >> Nathan Wiebe: Right, right. >> But I would imagine there are some questions that you cannot solve even by synthesizing these chemicals. >> Nathan Wiebe: Right, right. I was specifically was referring to spectroscopic properties. I mean obviously if you synthesized it then you could just throw it through whatever spectrometer you wanted and then look at that. But you couldn’t get information about reaction dynamics or maybe more sensitive pieces of information from it. So one of the key things that you can do with quantum simulation is that actually if you do this on a quantum computer a quantum computer can perform error correction and it can actually certify that the answer that you get out of this quantum simulation is actually valid. So that’s one of the very neat things that you can do with quantum simulation that you may not be able to do with other approaches to try to build analogous experiments say to one of these systems. And also the number of logical qubits that you need in order to equal or exceed the power of a conventional super computer isn’t very big. You only need something on the order of 40ish qubits in order to start getting to the regime of existing super computers. And also unlike many of the other quantum algorithms that have been proposed so far this family of family algorithms solves a host of problems that scientists actually care about deeply right now. And a lot of processor time is spent currently on solving these sorts of problems. >> So can you clarify what you mean about error correction and certification? Maybe you can give us an example of what you mean by that. >> Nathan Wiebe: You know what; actually the next slide probably gives me a better prop for describing this. So --. >> [inaudible]. >> Nathan Wiebe: Yeah, sure Matt. >> Maybe you are going to talk about this in a bit because you said you were going to talk about [indiscernible] theories. Two steps I was curious when you mentioned simulating small molecules on the computer. The first question is that on a quantum computer we can simulate time dynamics, but we might be interested also in having to cool the molecule to its ground state and this might [indiscernible]? >> Nathan Wiebe: Well there are several ways. Technically there is actually no overhead with bath qubits that are required because you can always in principal simulate an adiabatic evolution to cool it into its ground state and you don’t require any additional qubits for that. The bath, for the question about the bath, the worst case scenario you require one additional qubit. You can effectively do a single qubit phase estimation algorithm and use that to cool it. >> And then a separate thing is whether you might try to do, well I am just wondering how you are going to encode [indiscernible]? You know 100 qubits in terms of the spin system that’s great. That’s like 100 spins, but in terms of a system of electrons that’s not actually that many degrees of freedom. I mean if you have [inaudible]. >> Nathan Wiebe: Sure, I mean it really depends on the way that you are doing this. So for example if you are talking about simulating a quantum system in some first quantized form then the way that you would have to probably encode that is you would begin by saying, “Okay, well the electron cloud is distributed according to say an S orbital at the beginning of that”. Then you would encode your information efficiently with that. And then you would require a quantum algorithm to prepare the initial state as an S orbital and then evolve in first quantized form. That of course isn’t terribly efficient. A more efficient way of doing it is too, if possible, map it to a second quantized form and use creation and annihilation operators to deal with that. Then you only just have to abstractly mention the state at the beginning of the algorithm that the electrons are in. >> So I am just wondering, and maybe we can talk about this later, in terms of first quantized, second quantized, the [indiscernible] sounds huge, but at the same time when you ask how many orbital’s electrons can have and you count the number of possible spaces for you in a fairly small [indiscernible] it’s already getting up there to [indiscernible]. >> When you get into a second quantized form you only get 50 orbital’s. At best you still need to leave some things off for phase estimation [indiscernible]. So that’s the size we can still do classically. >> Right, so I am wondering --. >> You are [indiscernible] on the edge, but the number I tend to use is a good 200 at least because now you are beyond what you can do classically. >> Nathan Wiebe: Yeah, for these sorts of 40-100ish qubits is certainly for spin systems. That’s going to be where sort of the state of the art is I think. Whereas yeah, there are certainly overheads that come in with the chemistry simulations, but we can discuss that later. So going back one more on the stack to the question about the certification and why I was bringing that up. One of the big trends in experimental physics these days is to look at so called analog simulators. And this is justified by the fact that building a full quantum computer seems to be a very difficult thing to do. But quantum mechanics is just something that happens naturally in the lab. So why not just construct a gigantic lattice of hundreds interacting atoms and call that a quantum simulator? Well, actually it is quantum and it is simulating something. But the question is it simulating something other than itself? And if you want to use this in order to answer a computational question what do you need to do? In this case certifying this system of several hundreds qubits that was taken from the ion trapped group at NIST what you need to do, at least on a naive level, is you need to first understand what the state is and that may require tomography, which is going to be exponentially expensive. So there is no known way of getting these sorts of simulations which are already very complicated and very sophisticated to actually solve genuine problems. So that’s one of the reasons why I emphasize the certification issue, because you know this is false dichotomy where you say, “This digital simulation experiment that people have done is only 6 qubits, why are they bothering when you can do a few hundred qubits”? Well the reason is because to some extent you can trust what the output it of these sorts of experiments. Whereas with this one who knows. >> I see, so what you are saying is because you can break a quantum computation up into very small parts and then do an algorithm then you can certify. Rather than having a large system that you just say does something. >> Nathan Wiebe: You just pray that according to your understand of the laws of physics that it ought to be the system that you think it is. >> As just a side note, Burt and I have been looking at tomography as a [indiscernible] map. [indiscernible]. >> Nathan Wiebe: I am going to get to that point with my data fitting algorithm later. So that’s, enough said. So let’s talk about basically how this emulation process inside a quantum computer actually works out. There are several steps. The first thing is what you have is you have this quantum system and this is the thing that you would like to simulate. And you have your quantum computer, which is the device that you are using to simulate it. In general you are going to require more qubits for your quantum computer then the quantum system, but for spin systems it turns out they can actually be the same size and [indiscernible] often. So the way that the algorithms begin is you want to simulate the evolution of some initial state forward in time to a final state in the actual quantum system. You begin by taking that state and encoding that as a qubit state inside your quantum computer that’s logically equivalent to the initial state. Then this continuous time evolution that you would see in the physical system you approximate by a series of discrete gate operations, which map the system to a final state. And that final state ought to be logically equivalent to the evolved state up to some error tolerance. And that’s how these algorithms fundamentally work. And if you design this right then you will be able to know beforehand what that error can be in a worst case scenario. So unfortunately this isn’t it because getting the final state as the outcome of this thing is like getting a fortune cookie that contains the answer to all of life’s problems in it. It doesn’t really mean too much until you extract information from it. So that final quantum state that you get here it doesn’t solve any problems. What you have to do is you have to measure that, which destroys the state and then start the simulation protocol all over again to get more information. And repeat this potentially a large number of times in order to learn the information that you wanted about that quantum system. Yes? >> So concerning certification: since quantum computer makes errors as well [inaudible]. >> Nathan Wiebe: You know if testing is available. So let’s say you actually have the quantum system you are entirely right because of the strong analogy between the quantum computer and the other system, yeah, why not. If you have got the actual system the very best simulator of that system is the system itself. If you want to learn something about it just experiment on it. But, the problem is, the problem is, you don’t often want to use simulation in order to understand physical system. What you want to understand is you want to understand a mathematical description of a physical system. And that’s what a quantum computer can do. A quantum computer can say that if you designed this simulation algorithm properly then the mathematics that describes this time evolution should be simulated accurately by the quantum system. >> You can also ask questions you can’t do with a physical system. >> Nathan Wiebe: Yes, and that’s the secondary benefit. There are certain experiments that would at least be impractical to do in a physical system. So I will just do a very brief example of a spin system type problem. To give you an idea about how the simplest variance of these quantum simulations work. So this is a model knows as the Transverse Ising model for 2 interacting spins. It’s a model used in condensed matter physics to describe quantum magnetism. And the basic matrix you could just imagine is a 4x4 matrix that has an interaction between the zed components of these two quantum spins. And also it has an interaction with an external magnetic field that’s pointing in the X direction. So if you want to simulate that it boils down to the question of how do you end up taking this quantum mechanical time evolution operator which generates say this transformation into a series of gate operations on the quantum computer? And well in this case I guess it’s a 4x4 matrix so you probably could do it directly, but in general for higher dimensional systems it’s hard to actually synthesize this in a straight forward way. So the way that this is done is using Trotter decomposition. So the idea is that this Hamiltonian is the sum of 3 terms. Each of these it turns out can be efficiently implemented on a quantum computer, but together it’s not clear how you would do it. So what you do is you say, “Okay we break up this time evolution into a series a very short time steps and in each of these time steps you evolve only according to one of these 3 terms”. And by increasing the number of time steps you can make this approximation arbitrarily accurate. So if you do that you can actually find a quantum circuit that’s equivalent to these operations and this quantum circuit is down here. But, unfortunately this isn’t done because there are actually two halves of these sorts of simulation algorithms. The first half is what I have described here. It’s this process of Trotterization and breaking it into elementary rotations. The second half is converting, or compiling these elementary rotations into fundamental gates that the quantum computer could actually use. There are a number of different gate sets that can be chosen, but common choices are pi by 8 gates and Hadamard gates. There are many methods that are known to synthesize these single qubit rotations that appear in this sort of a circuit. And of course the groups at Microsoft Research are world leaders on these sorts of techniques. Now let’s talk about what the state of the art methods are at present in quantum simulation. The most common approaches that are often used are these product formula approaches, which are exactly the same sort of thing that I showed you with this example of the spin system. It’s just more sophisticated, they often will use higher order Trotter formulas than the one that I presented. They are very good for high accuracy simulations of extremely sparse Hamiltonians. More recently methods based on quantum walks have been developed, largely by Dominic Berry and Andrew Childs. And these methods are unfortunately not as good for high accuracy simulations, but they are much better for nonsparse Hamiltonians. The methods that I am going to be discussing in this are multi-product based methods, which actually are superior to these product formula approaches in almost every way. I am sure the group will be able to tell me very quickly what the one way they are not superior is, but I will leave that as a surprise. So with product formulas the best known for the scaling of product formulas is this: here M is the number of elementary Hamiltonians that you have in your Hamiltonian. Again, the Hamiltonian is like your quantum energy operator. So it scales like quadratically with the number of elementary terms that you have in your Hamiltonian and it scales nearly linearly with the norm of the Hamiltonian times the evolution time. >> [inaudible]. >> Nathan Wiebe: Yeah, it’s the maximum norm of all the little [indiscernible], so M times that will clearly be an upper bounded norm of the Hamiltonian. >> Are each [inaudible], are they supposed to be in some certain way not interacting with each other? >> Nathan Wiebe: Oh no, they can interact with each other. But the point is that each of these individual terms has to be individually simulatable. Okay, so they don’t necessarily have to be non-commuting. So the final thing over there is the error tolerance. And these algorithms scale sub-polynomial with the error tolerance that you want out of the simulation, but unfortunately not polylogarithm. So that’s the basic intuition behind this. And in order to get this sort of performance you can’t just use the basic charter formula that I showed previously. You really have to choose higher and higher order Trotter formulas as the evolution time and the error tolerance end up becoming more stringent. >> What’s the big O tilde? >> Nathan Wiebe: Oh, big O tilde, what I have done is having dropped terms that are logarithmic in here. Just to make the expression look nicer. >> [inaudible]. >> Nathan Wiebe: Yes, most [indiscernible] logs. >> Why are you saying this is sub-polynomial? >> Nathan Wiebe: The reason why is because obviously if I didn’t have this square root here it would just be like E to the log one over epsilon. So that would be, you know, log 1 over epsilon. But, when you have the square root on here this actually makes is smaller than any polynomial function. So that’s why this is sub-polynomial. So the basic intuition for how you generate these high order Trotter formulas is like this. Say you have your time evolution operator and what you want to do is you want to write this as a product of elementary time evolutions that you can actually carry out. And you want to choose the times for these elementary evolutions so that you reconstruct the Taylor series of the actual time evolution operator within some pre-prescribed error. This is actually a hard task to try and find all of these different times that are necessary in order to actually reconstruct the Taylor series. There is actually a cottage industry in numerical analysis for finding different times in order to do this. However, there is fortunately a very nice recursive scheme that Suzuki invented in order to refine a low order approximation into a higher order approximation. And this iterative scheme basically works as follows: you start with a low order approximation and you do two times steps forward with that low order approximation. Then you do one time step back for a certain value of time. Then you do two more time steps forward. And with this two forward, one back, two forward procedure, by choosing this single parameter rather than all of these different parameters individually, you can actually find a neat way of guaranteeing that you will increase the error by two orders by doing this at the cost of increasing the number of terms in your approximation by a factor of 5. So the trade off is you have 5 times as many exponentials, but you get two more orders of accuracy out of doing this. >> [inaudible] >> Nathan Wiebe: Sure. >> Does T turn to 0 somewhere? >> Nathan Wiebe: Yes it does. So here what I am doing is I am just talking about analyzing one of the short-time slices for the time evolution. If you are looking at a long-time evolution then imagine just taking R slices of that and making each of them short and then you use one of these high order approximates. >> So the slice is tiny, but it also gets partitioned some more. >> Nathan Wiebe: Yeah, it gets partitioned some more into these smaller bits out here. And so I would just like to give you guys a visual way of understanding how these product formulas work because our approach in contrast is going to do something very different. So here these boxes, what they do, this represents a Taylor series. This first box is a 0th order term in a Taylor series for the time evolution operator. This is the 1st order, this is the 2nd order and then these two have errors in them. And what the Trotter Suzuki Formula does is it combines them in an appropriate way such that when you multiply these boxes together these errors over here end up getting a negative sign because of the fact that you are looking at a backwards in time evolution. The products between all of these end up causing these terms to interfere with each other and cancel out. And a symmetry consideration ends up causing these high order terms to also cancel out when you put it in this form. So that’s basically how this works. One of the big drawbacks though is if you just take a look at what happens with the errors whenever you multiply are that every time you multiply new types of errors are created by multiplying terms that previously worked with error terms. So for example let’s take a look at the errors that the second error terms here. These can actually be formed by multiplying this correct term, or error terms of that scale can be formed by multiplying this correct term by that incorrect term. And thus you will generate a more complex set of errors through multiplication then you would otherwise. So a lot of the effort you can imagine conceptually in the Trotter Suzuki Formula is to actually counteract these errors that are introduced by multiplication and deal with the fact that multiplying polynomials isn’t a very natural way to build a Taylor series. The natural way to build a Taylor series is to add them. And that’s exactly what we do. So we suggest doing something very different; don’t multiply. Start with your lower order formulas. Come up with some weighted sum of these lower order formulas and you add them together in an appropriate way to make the Taylor series that you want. And this is very natural because of course with Taylor’s theorem you just add the individual terms in the Taylor series expansion to construct it anyways. So this doesn’t create this problem of propagation of errors. This sort of an approach has already been known in the numerical analysis community for quite some time. The Richardson’s extrapolation is the simplest example of these sorts of approximation methods. And in general we can construct multi-product expansions that work by adding together many low order product formulas with different coefficients out here and construct our approximations that way. That’s the method that we use here. So rather than these massive product formula approximations we add together a bunch of them with different with coefficients in order to approximate the time evolution operator within some accuracy. There are some advantages to doing this. The first key advantage is the number of exponentials that you need to create the formula using a Trotter Suzuki you will notice will grow exponentially with the order and that’s exactly for the reason that I mentioned. Every time you increase the order recursively you need 5 times more exponentials, hence the 5 to the K minus 1. Whereas multi-product formulas you only end up needing order K squared terms, which is fantastic. Of course there is something that is very un-fantastic about this as well. And that’s the fact that although the Trotter Suzuki Formula is unitary, which means that an ordinary quantum transformation that can easily be done on a quantum computer. These formulas in general are not unitary. So you have to go through greater effort on a quantum computer in order to try to synthesize them. However, we end up discovering ways to do so and actually find surprisingly that you can use non-unitary operations on a quantum computer to simulate unitary dynamics more efficiently then you could by using the unitary dynamics by itself. So the way we do this is using the following circuit. This circuit is designed in order to create a linear combination of two different unitaries. So effectively sums of two unitaries with an arbitrary coefficient kappa in front of it. And the way that you do this is you take this single qubit quantum transformation here, you apply it to the input bit and perform these controlled evolutions, where here U0 and U1 you could think of as just two different operator splitting formulas. This one might take one time step and this one could take two time steps, but they are the same formula. And if you measure this and observe 0 then you will have actually performed the correct linear combination that you want; whereas if you measure 1 then you won’t and you might have to perform some error corrections. This can also be generalized pretty straight forward to adding more than two terms just by using a larger unitary on many qubits. And that’s effectively what we do, but for the purposes of this presentation I will focus on just adding two terms together because it captures all the intuition that you need. >> So I have a philosophical question. >> Nathan Wiebe: Yeah? >> The [indiscernible] here actually needs to be [indiscernible], right? So how big is the [indiscernible] of information then and why do you think the [indiscernible] information is sufficient? >> Nathan Wiebe: The actual qubit remains in a pure state actually during the entire evolution. The reason why is because when you take a look at this linear combination of unitaries that you end up getting over here this is actually just going to map a pure state to a pure state. So it’s not like you are going to end up getting a mixed state out of this linear combination. For example, you know, let’s take the worst possible, most destructive linear combination that I can imagine for these sorts of things. Imagine you want to do a linear combination if identity in zed. What you end up getting is you end up getting just a projector on to the 0 state from that. And that just ends up mapping you to a pure state. So these sorts of combinations will always end up giving you a pure state. >> [inaudible]. >> Nathan Wiebe: Yeah, sure. >> [inaudible]. >> Nathan Wiebe: One more back, okay, certainly. >> So in cases that you deal with what is the probability of getting 0? >> Nathan Wiebe: Okay. The cases of probability that I deal with depends on the operation. And the reason why, I am going to get to this in a second, but there are two sorts of errors that end up happening. It’s depends if U1 is approximately U0 or approximately negative U0. Those are the two cases that come up in the simulation algorithm. And in one of the two cases it turns out that if an error occurs it’s entirely easily correctable. In the other case it’s catastrophic and you have to throw out everything. So much of the cost of this algorithm is actually going to boil down to making sure that the probabilities of these catastrophic events occurring are vanishingly small. But, for the non-catastrophic ones the probability of success is about .5 is what we found the optimal to be. >> [inaudible]. >> Nathan Wiebe: Okay. In general U0 and U1 are going to be two different values of the same --. Let me go back for a second. This is what it is. SX over here is an approximate for the time evolution operators. So it’s going to be one of these product formulas. So in general these are all going to be of the same form, but they are just going to use a different number of steps. So one of them will be two copies of the formula with half the time and the other one will be like on copy of it with the full step. So they are going to be exactly the same form of approximation, it’s just one of them will involve many steps and one of them will involve maybe a few. >> [inaudible]. >> Nathan Wiebe: For different values of cube here. >> Different. >> Nathan Wiebe: Yeah. >> So maybe you said this already, but what do you get if you measure 1? >> Nathan Wiebe: You know what, how about I go to the slide. All right. So if I go to the slide, if I measure 0 then what I get is I get the linear combination of the two unitaries that I want to actually implement. If I measure 1 then I get the difference between those two. So this is one of the reasons why I mentioned it depends on whether or not the two are close to each other or very far from each other, because if they are very close to each other then this term will be approximately 0. And if it’s approximately 0 it’s just going to clobber your state and your get some garbage left over; whereas if they are opposite to each other then you will have approximately 2UB which is a unitary operation. And furthermore it’s a simple unitary operation that you have got an operator splitting formula for so you can actually invert that. And so that’s the basic intuition for this. So for some of the steps, some of them will be good. >> But typical probability is about 50 percent. >> Nathan Wiebe: Typical probability is about 50 percent, yeah, that’s the best to shoot for. >> [inaudible]. >> Nathan Wiebe: What do you mean by that? >> You are removing the first qubit. >> Nathan Wiebe: You are just removing the first qubit. >> By measuring it. >> Nathan Wiebe: The second is where all of the information that you want is actually encoded. This is just used in order to allow you to do this trick where you perform a weighted sum of these two. >> So would you do this as a [indiscernible]? >> Nathan Wiebe: Perhaps, it’s certainly along the same sort of theme. >> So you make this unitary transformation just [indiscernible], discard it, but perhaps one thing to do then instead of trying to apply the unitary transformation to your actual states you make an [indiscernible]. If it succeeds then you [indiscernible]. >> You try to recover from the other one? >> Nathan Wiebe: Yeah, it’s one of the things that I remember we looked at. We weren’t able to get any traction on that particular idea to use some sort of a gate teleportation type idea to do this. That would clearly be ideal, but I will be happy to talk about it later. >> The problem is that these are not [indiscernible]. >> Nathan Wiebe: Yeah, exactly. All right. So that’s basically it. As I said before, but now it’s a belabored point, some of these are bad errors and some of these are not so bad errors. So just to give you a basic overview of how this would work for the simplest possible case imagine what you have is you are just using this Richardson extrapolation formula here. In the simulation you use this particular formula and here S1 could represent the simplest Trotter formula you can use, the symmetric Trotter formula. So you use that formula with the weight of 4 here and a weight of minus 1 there and you use those circuits in order to implement this particular linear combination. Again, in general what will happen with this is that the error correction can be done with high probability in this case because S1 has opposite sign to that S1. So essentially how it works is as follows: the flow chart is you attempt a time step with one of those terms in here. You have two options, either succeeds or fails. If it succeeds you go onto the next time step. If it fails, well then you attempt error correction. And the error correction that you attempt you can do using this exact same method. It turns out there is a chance of a you attempt that error correction step. But made arbitrarily small. So you attempt that abort the simulation, otherwise you try that repeat this process until you are done. catastrophic error when that probability can be and again if it fails you time step again. You >> So you are doing this at every time step. >> Nathan Wiebe: You are doing this at every time step. >> And you could have, you know, a million time steps. >> Nathan Wiebe: You could have a million. >> And if one of them has a chance of failure abort? >> Nathan Wiebe: Yeah, the second last time step could be the one that fails. Nonetheless when you consider all of these possible things that can go wrong and all of the additional costs involved in making these errors small you notice that what ends up happening is we end up getting an exponent up here which is smaller then the exponent that you end up getting for Trotter Suzuki. Just because even despite all of this, these problems with error correction and the like, the advantages of these multi-product formulas are so great that you do end up getting a benefit out of it. This polynomially ends up reducing the scaling with the error tolerance over the best known results using just pure Trotter formulas. >> That includes all the abort properties also? >> Nathan Wiebe: Yes. And so basically what this means is we have given a method that is superior to product formula simulation methods in nearly every way. Furthermore this could lead to improved quantum chemistry algorithms and also, most importantly I think, it gives a new way of thinking about simulation that doesn’t directly involve a logical mapping between the initial system and the final system. The dynamics effectively that are going on in the quantum computer are actually not the same sort of dynamics that are going on in the physical system, because of the non-unitary. And it says that this paradigm has some advantages over the traditional paradigm. But the real question is, “Well is quantum simulation a killer application”? And I would say that for scientific computation arguably you can say yes, however for general purpose computation probably not. Really the golden standard of this talk is VisiCalc. And quantum computation is not something that is going to get people who are interested directly --. Sorry, quantum simulation is not going to get people who are interested in say data analysis to directly get out and be excited about quantum computation. So I don’t think it’s quite as compelling for general purpose computation. But, undoubtedly it could be incredibly useful for scientific computation. However, the next think you could think of is maybe actually because you have such a compelling application for scientific computation maybe it’s possible that some problems within the scientific computation community or simulation problems can be mapped to other classes of problems that are more relevant. And this is exactly the same sort of intuition that’s used in the Harrow Hassidim and Lloyd algorithm for solving linear systems of equations. So this leads to the second part of my talk which talks about linear systems and data fitting. So the linear systems algorithm really actually is built out of three components. And the first component is quantum simulation. The second uses essentially ideas from phase estimation and amplitude amplification in order to make everything work. And basically the way that this problem works is imagine you have some known matrix A multiplied by some unknown vector gives some known data. And what you want to do is you want to find the unknown vector. So this is a standard matrix inversion problem that you want to solve, something that people do all the time. So obviously if this actually provided an exponential speedup that has no caveats this would certainly be a killer application. But, there are some pretty big caveats that we will get to in a second. The way that you approach using quantum computers in order to solve this problem is you begin with the basic problem and then you quantize it. You replace the input vector by a quantum state. And again this quantum state, because of normalization will only be proportional to that input state. You can lose constants of proportionality with the way that they designed this. Then what you do is you invert A by diagonalization. And the way that diagonalization works is basically it uses phase estimation to store the eigenvalues values of e to the minus iAt in a register. Then you divide by the eigenvalues values in order to implement e inverse. And that’s how the quantum algorithm essentially does its job. And the key point though is that at the end of the algorithm what you end up with is you end up with a quantum state that encodes the answer to your problem; not the answer. And that’s a big drawback, because in general the size of the problem is exponential. So if you wanted to read the output out of this you would have to sample the quantum state an exponentially large number of times. And you totally lose any advantage that you could possible get from this algorithm by doing so. So that’s one of the major drawbacks of this particular algorithm. However, you can, if you are interested in this and I don’t know why you would be, but what you can get out of this is you can figure out expectation values of your solution to the system of equations. There may be some application where this is useful, but I don’t know what it is right now. And furthermore this work can be actually applied to solving systems of differential equations, but again you don’t get the solutions there you can only find the expectation values or other similar properties to systems of differential equations, which although does have some value there it doesn’t give you quite everything that you would want. >> But on case why is a sequence of 0s and 1s. compute [inaudible]? >> Nathan Wiebe: Sorry? So then when you >> So if lambda --. >> Nathan Wiebe: I think, let’s go back here. >> It’s a binary problem don’t you get [inaudible]? >> Right, the binary case. >> Nathan Wiebe: Okay, so if you get --. You will always get a binary representation, but the way that actually this is encoded generally speaking it’s going to be all these real values for each of these components will be encoded as amplitudes of the particular values of this. So for example the first entry in that would be encoded as the amplitude --. Yeah? >> [inaudible]. >> Nathan Wiebe: Right, but the amplitudes will be 0 and 1 in that case. And in that case you may be able to take advantage of the peculiarities of that particular constraint to learn more information than you would otherwise, but certainly the general problem is going to be hard. Maybe with this particular problem that you mentioned there could be a cunning way to get around it, but I don’t know what that is for the moment. So there are two basic problems I guess for this. I mentioned the output problem of reading the output, but also the input could be exponentially hard to generate. So you have this really odd situation in a way with this algorithm, right. The actual hard part on a classic computer now becomes trivial, but generating the input and reading the output becomes extremely hard. >> Well I guess to be fair it would be extremely hard on a classical computer. >> Nathan Wiebe: It would be extremely hard on the classical computer as well, right. >> But once you [inaudible] it once it doesn’t get destroyed the first time you look at it. So it’s exponentially reloading to get the accuracy that you want [inaudible]. That’s the real problem there. >> Nathan Wiebe: So clearly this isn’t a killer application at all. The question is, I mean is there something that you can do with this? Can you generalize these ideas to come up with a different algorithm that doesn’t have these same drawbacks? And this is something that Daniel B and Seth Lloyd and I thought about when we were approaching this. And the application that we thought of was quantum data fitting. And the basic idea behind the problem is as follows. Oh, sorry, data fitting to me is such a ubiquitous task in general purpose computing that if we did get a quantum speedup for this then certainly this would satisfy the goal of this talk. It would be a killer application for quantum computation. And so the question is, “Well, does it work out”? Well let’s discuss the data fitting problem and how that actually works. Imagine you have got some function a Y of X and what you want to do is you want to represent this as some combination of fit functions, FJ of X. Your goal is to find weights for each of these fit functions that minimize the square error. So that’s the idea and there are many ways that you can do this. Conjugate gradient methods are a great way of doing this on a classical computer, but you can also use a linear algebra approach where you just apply an operator known as the MoorePenrose Pseudoinverse and that operator is given down here. The key advantage of our method, even though this is actually more complicated than the inversion problem is that this matrix F here no longer necessarily has to be square. So for example you could try to fit and exponentially large data set to a line, right. In that case your output dimension would be you would have two parameters to fit a line. So in that case you could actually solve the output problem. You could efficiently read all the information that you need for sure in this one particular application. The input problem still remains unfortunately, but the output problem could be resolved by this application. All you have to do is apply this Moore-Penrose Pseudoinverse. I should also mention that this pseudoinverse operation, in the case where the matrix is actually invertible reduces to the previous problem considered by Harrow Hassidim and Lloyd. So the way we do this is we follow the exact same sort of strategy that Harrow Hassidim and Lloyd employed. You start with your vectors and you encode them as quantum states with each of the coefficients stored as amplitudes of that particular value, or I should say entries. Then what we do is we use a trick because in order to leverage quantum simulation we need to have a Hermitian matrix. So we need to have something that is self adjoined. And that generally won’t happen for these fitting problems so we use a dilation of the space. We introduce an additional qubit in order to make the matrices Hermitian in a larger dimension. And after using this trick it turns out that this F dagger F inverse is actually just F inverse squared. So after using this trick this ends up becoming that. And so this can actually be executed by using the Harrow Hassidim and Lloyd ideas three times. >> Did you say a qubit? Is that one qubit? >> Nathan Wiebe: Yeah, yeah, you actually only need one in order to extend it out. The idea is that as follows: you generate the initial data, then you use Harrow Hassidim and Lloyd to implement F dagger. I should say you don’t quite use it because you don’t need to divide by the energies in that case. In that case you multiply by the energies, but everything else is exactly the same. Then you use the algorithm two more times in order to implement these F inverses and that’s what you do. The cost of doing so is as follows: the key point to take home is that it’s efficient, there is nothing that is exponential in this problem, S is the sparseness of the matrices in question, kappa is the condition number and epsilon is the error tolerance. So that is, oh and capitol N is the dimension of the largest dimension of the matrix F. So you can do a number of things that are actually kind of interesting with this. So, yes? >> How are F dagger and F inverse related? >> Nathan Wiebe: Ah, how are F dagger and F inverse related? they are not directly. Okay, >> [inaudible]. >> Nathan Wiebe: You are thinking unitary. If they were unitary then F dagger and F inverse would be the same, but in this case it’s not square. So this is just the conjugate transpose of a rectangular matrix. >> No I am just, because normally this whole thing just collapses down to then an F dagger cubed. >> Nathan Wiebe: Yeah, yeah. >> Will that work in the square case? Does it help you at all? because the whole point is the Moore-Penrose Pseudoinverse? No, >> Nathan Wiebe: Yeah, exactly. >> Okay. >> Nathan Wiebe: The whole point was to use the Moore-Penrose Pseudoinverse in order to write this as a series of matrix multiplications that you can then carry out on a quantum computer. So that’s the basic idea. And there are actually a few things that you can do with this, although the output state in general can be exponentially large and it could be for the fitting problem hard for you to actually learn to fit. It turns out that you can actually learn the quality of the fit efficiently regardless of the dimension, which may come as a surprise. Also, of course, if there are a small number of fit functions that are being used you can directly learn to fit. >> Don’t you still need an oracle [indiscernible] for fit functions? >> Nathan Wiebe: Okay, so several things. I have not mentioned two oracles. So there are two oracles, well actually no, effectively there is one oracle, sorry. The oracle will provide you all the matrix elements you need for the fit functions in a particular basis or in fact in general you can imagine it in several basis if you have different natural basis for each of the fit functions that are used there. It is also useful to imagine that the input state is generated by an oracle. >> So are the oracle costs in this [inaudible]? >> Nathan Wiebe: Yes, actually these costs that I mention are effectively the query complexities. So the way that you would go about and learn the quality of the fit is really straight forward. What you do is you will note that F lambda gives you the approximation to the data set that you have and Y is the precise data set you have. So if you want to compare the two just use a swap test. The swap test is a quantum mechanical test that allows you to efficiently determine the difference between two quantum states. So you repeat this process some number of times and then you end up finding the quality of this fit by comparing. So one of the things that’s actually kind of interesting is actually the cost of this algorithm is less than the algorithm I gave previously. The reason why is because the F here clobbers one of the F inverses that was used in order to construct lambda. So actually making F lambda is cheaper than making the previous state. >> [inaudible]. >> Nathan Wiebe: Ah, very good question. So there is only one step using our approach that it actually pays to do amplitude amplification. Let’s go back and this is where the [indiscernible] comes from. This is the only step that we use amplitude amplification in. These two steps over here, it turns out if you try to use amplitude amplification after that you have to reflect around the evolved state and that’s just too expensive. The cheapest thing to do is just use amplitude amplification here and not use it on those two. >> So if it’s not squared what is F [inaudible]? >> Nathan Wiebe: Excellent question. So F here, sorry, what I should be saying here is that these are not the rectangular matrices in the original problem. These are the Hermitian extensions of the original matrices. So from this perspective once you have dilated it to a Hermitian matrix F inverse of that dilation makes sense and it is square. >> But F dagger is not F inverse? >> Nathan Wiebe: F dagger is not F inverse. >> [inaudible]. >> Nathan Wiebe: Okay. The next thing is if you also want to learn the value of the fit functions one of the things that you can do is you can just compresses sensing as a tomographic technique in order to learn what the amplitudes are for the output states. And that cost ends up coming in as this additive term here, where M prime is the number of fit functions that you actually use and epsilon is the error tolerance. Another thing that I should mention that’s really kind of cool that you can do with this is that you can actually find, if you don’t know a priori a good set of fit functions actually by measuring the output of the algorithm, because the way that everything is encoded it encodes the answers as amplitudes. So the functions that have the highest amplitude are the ones that are most significant to the fit and the ones that you are most likely to measure when you measure the state in the end. So what you can do actually with these algorithms is something really kind of cool. You can say, “All right, I don’t have any idea what a good set of fit functions would be to represent this data. I will start with a complete or near complete set of fit functions”. Then you measure the output state and you sample from that. You find the ones that appear most frequently and cut out all the rest of them and that will give you, in some cases, a good guess for a set of fit functions that you should use for the problem, even if you can’t a priori know which fit functions are best. >> But I still have the classical cost of doing the tomographic work. >> Nathan Wiebe: The thing is you only have to do actual tomography later, right. Once you have found the fit functions that you want to use in the final tomography then you do the tomography. So this process over here you can think of as like a compression process. You start with a set of fit functions and you use this to throw out the ones that aren’t useful. So just to compare to Harrow Hassidim and Lloyd preparing the initial state is inefficient with Harrow Hassidim and Lloyd’s approach. Unfortunately, with quantum data fitting the same thing in general is true. Learning the output isn’t necessarily a problem with our data fitting algorithm in many cases and furthermore we can actually estimate the fit quality efficiently. So this gives you actually something that’s useful that you can do directly from this particular application. Also, going back to something that I mentioned previously a very natural application of this is parameterizing quantum states, because if you have a quantum state that’s yielded to you by an algorithm or some other device then you have already solved the input problem. You can just use this as an alternative to tomography and get an approximate reconstruction of the quantum state that way. And yes, unfortunately it also strongly depends on the sparsity of the matrix in the basis that you choose for the problem. So you really do have to be cleaver about the way you choose it or use good simulation methods. So is data fitting a killer application? Well I honestly have to say I think no. The reason why is because of the fact that it still has many of these problems that the Harrow Hassidim and Lloyd algorithm had. In order for it to be a killer application what I would love to be able to do is to get some random data set that somebody has and just feed this in and get it process that random data set incredibly quickly. The problem is if you have to generate that data set via look up table you are not going to be able to pair that initial state efficiently. So what means is that means that this can’t be used in order to solve many general purpose problems that people are interested in, but it comes agonizingly close. Maybe there are particular problems that people are interested in making, a cryptographic setting or something else, where the initial state actually can be efficiently prepared. And I think that taking a look at studies along these lines may actually end up leading to the first true killer application for quantum computation. Thank you very much. [clapping] >> Krysta Svore: Thank’s Nathan, are there any questions? >> Let’s think of a time when all the big data in the world will be stored as quantum [indiscernible]. >> Nathan Wiebe: Yeah, it’s already at a point that when your resources are already quantum then this, I agree with your remark. >> So do you have examples of problems where the state [indiscernible]? >> Nathan Wiebe: Well, yeah, actually I do have problems. This of course betrays my background, but imagine what you have is you have an un-trustworthy quantum simulator that produces some particular output data. And you would like to learn what that data is. Well then you can use that device to generate your input state, run it through this algorithm and then fit it and learn the fit quality efficiently. However, I think that there still is more work that probably should be done. One of the things that we haven’t done is we haven’t looked at particular sets of fit functions that can be or cannot be generated efficiently. And that would probably be sort of the next step towards making this much more practical by identifying some problems that are concretely useful for people to solve and actually looking at the cost of implementing the oracles needed in order to perform this. >> [inaudible]. It was pretty obvious that it was stepping in [indiscernible]. It’s a step towards universal computing. People like Bill Gates who went and try to build a personal computer of course thought about various applications it was obvious to them that [inaudible]. Now here we are still in the stage that even if these are two killer applications, both of them are still specialized devices which do few special tasks far removed from universal computers. >> Nathan Wiebe: So your questions is --. your question. Sorry I am trying to parse >> He is trying to figure out if Visicalc is a reasonable thing to be comparing against for the general case? >> Nathan Wiebe: Well I think it is a reasonable, the spirit of Visicalc certainly is a very reasonable thing to compare against in that really ultimately what I would like out of a general purpose killer application for quantum computation is something really gets a bunch of people who already have substantial processing needs to be able to say, “Wow, I really could use a quantum computer should it ever come out in order to handle my daily problems”. And I think for many of these people who aren’t already in the scientific community one of theses applications haven’t come out yet. >> So I would say view it as a cloud service, no one is going to have a machine in their office. But could you sell a service that did their data analysis much faster and much better for a general case which is the movement [inaudible]? >> [inaudible]. >> Well I don’t know how specialized it has to be if you actually solve the killer application problem. That’s the point, it should be general [indiscernible]. So are you familiar with [indiscernible] on [indiscernible]. >> Nathan Wiebe: No I am not. >> Okay, the late 90s. It’s extremely efficient preparation of lots of data into states into a circuit model and it might be worth looking at for some of the preparation models. The other is [indiscernible]. >> Nathan Wiebe: I am very familiar with that. >> Okay, as I figured. >> Nathan Wiebe: So that work in particular I feel that some of the ideas can be used in order to speed this up. In particular there ability to avoid using amplitude amplification and measurements in the early parts of the procedure ought to be able to be useful for reducing this dependence on the condition number in particular. I suspect that we might be able to reduce this as low as kappa to the third by adapting their techniques. That’s actually ongoing work right now that we are looking at to optimize these. >> And have you thought at all about the fitting questions, the [inaudible]? >> Nathan Wiebe: Well again, it really depends on the fitting functions and the basis in which you want to do this in. I haven’t thought about it in great detail. I know that this is actually a very real and serious problem, but I think to some extent you need to have an idea of what you would like to use this algorithm for in the first place in order to do that. So that’s what we are doing right now. We are trying to think about what the actual best use cases are going to be for our algorithm and then attack those. However, natural candidates for these sorts of fit functions would be for example bounded polynomial functions that only end up having support over small areas or trigonometric functions. Those are the sorts of things that would certainly be very easy to do either with a quantum fourier transform or directly using a sparse representation. >> I think that would, I am just agreeing with you that this would be a reasonable approach because the machine learning people only pick the fitting functions they use because they are easy to compute and are easy to separate, which are the same things. You find things that are quantumly easy to do. It doesn’t matter what the actual function is as much as the attributes. >> Nathan Wiebe: Right. >> Krysta Svore: Are there any other questions? Nathan again. [clapping] If not let’s thank