Krysta Svore: So welcome back. Now we're going to hear from Patrick Hayden on the role of quantum information on space and time. Patrick Hayden: All right. Well, first I'd justlike to thank Krista and Microsoft Research for the invitation to be here and attend the summit. The past couple of days have been really interesting. And I've really enjoyed getting to meet some of the people in the quantum architecture's group. Like it's a very different and encouraging approach, I think, to quantum computation. Or building a quantum computer that I'm used to. I'm used to talking to a lot of theoretical computer scientists and theoretical physicists. And to talk to people who have actual, honest, good skills making computers work, I think this is obviously a crucial part of the enterprise of building a real functioning quantum computer. So I really like what's happening here at Microsoft. The other preface is that the topic of my talk, you may have noticed, is a little bit different than what was circulated the first time in the notes that some of you have. I was going to speak about a topic that was properly quantum computation, and when I arrived here Sunday evening, a Microsoft researcher, who shall remain nameless, expressed dismay that I decided not to speak about a topic related to quantum gravity. So I felt sheepish. So I'm going tell you a little bit about the intersection between quantum information and the physics of space and time and the quantum mechanical physics of space and time. This is clearly quite distant from what we were discussing this morning, but I think it's one of the virtues of quantum information as a field that it really spans what will hopefully become practical, all the way to the truly fundamental. And the same people can actually and often do work in sort of wearing both hats, depending how they feel when they wake up in the morning. So for those of you who don't work -- and I think most of you don't work in quantum gravity or such areas, you can get a bit of a flavor for how quantum information can inform the way we think about some questions -- you know, interesting questions in fundamental physics. And so there are going to be -- I'm basically going to try to illustrate this claim that quantum information is useful, or can teach us interesting things about the structure of space time through two examples. And the first one is essentially I'm going to give you a complete characterization of all the ways that quantum information, you know, qubit-type stuff, can be distributed in space and time. And the answer is going to turn out to be beautiful and simple. And the second half is going to be that I'm going to provide you with an information theoretic interpretation of the length of a curve in space. Now, that may not mean anything at all to you at this point, but hopefully by the end of the talk what it would mean to have an information theoretic interpretation of the length of a curve will -- at least hopefully you'll get that much out of it. So let's start with some quantum information bedrock, something that I think everyone in the room should be able to agree on and be familiar with. And that is that cloning of quantum information is impossible. So if you had the quantum system in an unknown quantum state, which we call phi, you cannot build a machine that will produce two quantum systems both in that same quantum state phi that would work for all phi. And this is a very -- well, I guess we could phrase this another way, if we wanted to, that quantum information cannot be replicated in space. And this is really a crucial feature of quantum mechanical information, because it is a consequence of this no-cloning theorem that quantum information cannot be measured without causing disturbance. And that is really at the root of all of the cryptographic applications of quantum mechanics. And I guess we heard a few of them this morning. So quantum information cannot be replicated in space. But this is a talk about spacetime. So let's think about this from a slightly more global perspective. So here's a spacetime diagram, the horizontal axis is positioned, the vertical axis is time, and on my diagram -- well, it won't matter phi, but light travels at 45-degree angles, upwards and to the right and upwards and to the left. So if we have some quantum information that's localized, so you know, at a point phi, and let's say it's a spin and that spin has moved around a bit as a function of time, then it traces out a trajectory in spacetime like so. And kind of obviously, right, the quantum information follows the spin-around. And so phi we had the same quantum information replicated at many points in time. It's kind of trivial but it's true. This is what happens. And in fact not only is it possible for quantum information to be replicated in time, but it's necessary. The quantum mechanical evolution law is unitary. Unitary transformations are reversible, they don't destroy any information. And so while quantum information cannot be replicated in space, it has to be replicated in time. You have no choice. And so this first half of the talk is just going to be about taking that observation. Once we know that, okay, quantum information can be replicated in this context, and try to figure out exactly what types of configurations are possible -- Yeah? >>: I've often been curious, how did this quantum bit get generated in the beginning? Patrick Hayden: So that's a good question. There are different ways of formulating an answer, like there are different what turn out to be equivalent ways of answering that question. So one way of formulating it would be to say that there are some kind of referee or adversary who prepares the system and doesn't tell you what state he or she prepares it in. And then afterwards when all is said and done is going to perform some test to see whether you've succeeded at whatever task is defined. Another way of defining it would be to say that that system is actually entangled with another system that we don't talk about, we'll call the reference, and we check to verify that the entanglement is preserved. >>: Beyond the [indiscernible] from a physic's point of view, [indiscernible] two groups of physicists are arguing which came first, space or time, right after the big bang, and each of them thought the other were dummies for not seeing it according to their view. But I've often wondered, where did quantum bits originate in the creation of our universe? Patrick Hayden: Oh, my. Okay. Well, that would take us very far afield. I would be very happy to discuss it, but I think that for the moment we should defer trying to answer. I think we could all discuss that question with passionate opinions. So I think I told you what the goal is. So let's just formalize this a little bit more. What am I going to mean by localizing quantum information in different regions of spacetime? I first have to tell you what regions of spacetime we're interested in. And so phi I have -- well, a pair of regions, each labeled by a pair of points. The lower point, say, Y and the upper point is a zed. And I'm going define something called the causal diamond. And the causal diamond is the intersection of all the points in the future of the zed. So this is a spacetime diagram. Light travels at 45-degree angles, so everything that could be affected by what happens at Y is inside this V. And the intersection of that with the past of the corresponding zed. So everything that could have affected zed that comes from the past. So I have two of these causal diamonds there. Another way of phrasing this is that this common diamond, Dj, consists of all of the points that can be affected by something that occurs at Y and can also effect the outcome of -- or the state of zed. Well, I was going to ask you a question. So suppose we have some quantum information. Now we're going to ask. Can these two causal diamonds both contain the same quantum mechanical information? Is that something that can be done? And in this case it can be done and it can be done very trivially. So if we just have the system prepared at some point in the distant past, in this state S, then I can just carry S to this point P. And at that point P is actually inside diamond D0, so it's been localized to diamond D0, and then we just redirect it again along this curve, which is 45 degrees, so it's not violating relativistic causality, and it ends up in diamond D1. So in this case we have succeeded in replicating the information of these two diamonds. And I guess we can also say conversely if these two diamonds were such that all of the points of diamond 0 were space like separated from all the points in diamond 1, meaning it was impossible to send a message, it would violate causality, then it would be impossible. Because if we had succeeded we would have violated the no-cloning theorem. So in this very simple setting with two diamonds in one space dimension, one time dimension, the story of information replication is very simple and uninteresting. That all we have to say is that if we define a relation, the two diamonds, D0 and D1 are causally related if, and only if, it's possible in principle to send a message from one to another. I don't care if it's a message from D0 to D1 or D1 to D0, then I can replicate the information if, and only if, they're causally related. And that's the story. And we do it in this trivial way. But, again, if this were the whole story, this would be sort of an uninteresting topic. So now you should stop and think, okay, as connoisseurs of quantum information, what interesting quantum information phenomena are you familiar with? Well, you're familiar with probably teleportation in which quantum information is somehow split. It doesn't follow a trajectory, but part of it is carried in these classical bits and part of it is carried in entanglement. Probably familiar with quantum error correction in which somehow quantum information is delocalized in such a way that we can't learn anything at all by looking at some -- a few qubits, we have to look at a large number simultaneously. So more generally, clearly we're going to have to have a more sophisticated answer. And I just want to point out that while I don't want to get into the details in this talk of exactly what I mean by replicating the information, I do want to -- I just want to emphasize that it can be turned into an operational question, where it's not just kind of vague notions of replicating information. If you want to make it operational, you could imagine your referee stationing agents at these Y's. And from time to time one of the agents will just decide, okay, I want to check. You claim the information is actually present in diamond D0, show me the money; show me that it's really true. So if this agent decides to check what's going to happen, well, he can sort of -- I can take his request and race into the future, you know, try to access all points in the future. So I can access all points that are part of D0 in the future. So I can race to the point P and grab the particle that was at point P and redirect it to zed. And so the information appears in zed, and then I can confirm to you that the information was in the diamond because I can actually present it to you. So I can turn this into a kind of game in which information replication corresponds to winning at the game and failing to replicate to the information corresponds to losing the game. And just a little bit about this causal diamond geometry. I drew the base point and the top point as having the same spatial coordinate, but that kind of privileges a certain set of coordinates in a way that really isn't in the spirit of special relativity. So as long as the upper point is in the future of the lower point, that gives me a perfectly good causal diamond. And you should observe that as the upper point gets closer and closer to being lightlike separated, so to being on a light ray from the bottom point, the diamond appears to get thinner and thinner. And in the examples that I'm going to show -- I guess the only examples I'm going to show, just because they're easier to draw, my diamonds are actually going to look like line segments. But you should think they're not actually line segments, they're just very long and very thin. So just keep that in mind. So this is the example. My favorite example, I love it. A few people in the room have seen it. That really illustrates what you need to understand if you want to understand how information can be replicated in spacetime. And almost everything you need to understand. So I call this the causal merry-go-round. I have three causal diamonds. So three regions of spacetime. And let's see how they're arranged here. My Y's are on the vertices of an equilateral triangle slightly crushed by aspect ratio here. And my zeds are on the mid points of the edges between those vertices. But they're shifted in time. So you see the vertical access is time here, and I have two spatial dimensions. And they're shifted in time just long enough for a signal to get halfway along an edge. Right? And so the causal diamonds are in fact these degenerate line segments that are like light rays. And in fact the red arrows are also light rays, because of the symmetry of the situation. And I would like to replicate the information. You know, the question is can I replicate the information. And the structure here, you know, it's designed to be, I said, this causal merry-go-round. Because there's a point in the 1 diamond that's in the future of the 1 diamond, and there's a point in the 0 diamond that's in the future of the 1 diamond. There's a point in the 2 diamond that's in the future of the 0 diamond. Right? But somehow this relation -- this property's intransitive. There's no point in the 2 diamond which is in the future of the 1 diamond. So you can go one way or -- yeah, so you can't compose this property. And so if we're going to try to replicate the information naively, what would we do? We might say okay, start with this information, which is localized at S, and let's just carry it to, say, Y2. Right? And so the information has then entered the 2 diamond. So I put the information in the 2 diamond and then I could maybe carry it along this light ray to the point zed 1. And then the information would also be in the 1 diamond. But then I'm sunk. Because from the 1 diamond there's just no way to get to the 0 diamond. So the 0 diamond is left zed. So I can get the information into two of them but not into three. And it seems like here you're kind of really up a creek; right? But there is a way to make this work. And the way to make this work is to encode the quantum information into a particular quantum error correcting code. And I apologize for the nonstandard notation to the people who are into quantum error correcting codes. What I mean here is a code consisting of three particles, such that if I collect any two out of the three I'll be able to reconstruct the quantum information. So such codes exist. I know that the quantum architecture people are quite familiar with quantum error correcting codes. If you know qubit codes, you may know the further fact that there is no qubit code with this property, but there's a qutrit code. And that's perfectly good for our purposes. And what we're going to do, once we've encoded the quantum information that was originally at this point S into three particles, such that with any two out of the three we can reconstruct the information, we'll just send one particle to each of the Y's. And then from the Y's, we'll have the particles traverse the red light rays. So that's consistent with relativity. And you may have missed it, but now two different particles have passed through each of the causal diamonds. So let's just rewind that to make sure that you see what happened. So at the end point, of course, every diamond contains one particle. And at the beginning point every diamond contains one particle. But they're not the same ones. Because the particles travel along the red light rays moving from one diamond to the next. So each diamond contains two out of the three particles and two is enough to reconstruct the quantum information. And voilà, the quantum information is replicated in each of these causal diamonds. So that's a kind of simple, nice example. But it obviously doesn't cover the general case. You could have all kinds of crazy configurations of causal diamonds in spacetime. So let's look at a slightly more crazy configuration. So here we have four diamonds. And the picture is -- or the setup is that each of these diamonds is on one of the faces of the cube. And again the vertical direction is time. And the question would be, okay, can we replicate the information here? And, you know, if you're like me, presented with this scenario, I kind of threw up my hand and say I don't know. What could I check? The one thing that I know is that I shouldn't violate the no-cloning theorem. So it should be true that every pair -- if I'm going to succeed in replicating the information, every pair of these diamonds should be causally related. There should be some way to send a message from one to the other. Because if there is any pair that was actually causally unrelated, meaning all the points were space like separated, then to replicate the information in them would mean that I'd clone the information. That the same information was at two space like separated points. And that's the first thing to check. Well, it turns out that every pair of these diamonds is causally related. There's always a way, for each one of these -- for example, from the 0 diamond, I can send a message to the 1 diamond, maybe from the 3 diamond, that Y3 point, I can send a message to the 0 diamond and so on. So every pair is causally related. And so the obvious thing to check, the violation of the no-cloning theorem doesn't rule out this picture. So what comes next? What's the next most complicated constraint? I couldn't come up with any. Neither could my student, Alex May. And eventually we proved a theorem that in fact the no-cloning theorem is the only constraint on the replication of information in space and time. So if you have any configuration of causal diamonds, they can contain the same quantum information if, and only if, every pair is causally related. And so -and the equivalent way of formulating this is to say that they can -- each of these diamonds can each contain the same quantum information if, and only if, there is no obvious violation of the no-cloning theorem. And it didn't have to be this way. Yeah? >>: Is it true that this [indiscernible]? Patrick Hayden: Like a trivial consequence of Lawrence and ->>: No, is it trivial that this definition or construction is [indiscernible]? Because like you want -- Patrick Hayden: Yes, well, I don't know if it's trivial. But the reason I talk about causal diamonds is because in a relativistic theory, you can attach a density operator to the causal diamond. And basically you can foliate the diamond by space like hypersurfaces and there's a unitary time evolution from one to the next. So there's effectively just a single density operator for the diamond. And so the question of whether the information is there is whether there's a fixed unitary transformation that will perform the decoding. So yeah, it's well defined. Okay. So the answer turns out is very simple. And I don't know really what message to take from this, except that it didn't need to be so pretty. And I take that to be some kind of indication that there's some beautiful compatibility here between quantum information and, I guess, relativistic causal structure. Just as an aside, most of you will not be familiar with this, but there has been a controversy raging in the quantum gravity string theory community over the past couple of years -- well, actually over the past 40 years, but it flared up in the past couple of years, about the fate of information in black holes. And people, myself included, have gotten themselves completely wrapped up in knots and confused in thought experiments that actually involve the cloning of information in spacetime. And black holes have this annoying property of seeming to become cloning machines. And so because we got so confused on those lines, that was one -really the motivation for this work to try to understand the replication of spacetime in a situation that's much less exotic. Just Minkowski flat spacetime. But even there, there are surprises and much more interesting structure than you might have thought. So this conclusion for Part 1 is that there's a surprising variety of different ways to replicate quantum information in spacetime. This discussion, it contains a special case as the theory of quantum secret sharing that some of you may be familiar with. It interfaces with theory quantum error correcting codes. And of course I would love to convince some quantum optics people to build this thing. And we're working on that. Yeah? >>: So we know there are various types of error correction codes, and they have different [indiscernible]. I wonder what those theorists say about your problem of [indiscernible]. Patrick Hayden: Well, if I were going to give an entire talk about the subject, what I would move -I would explain to you how to do the general case. And to solve the general case, it turns out that the natural way to do it is that you need a quantum error correcting code with some very interesting properties. So you're going to be correcting for losses, but you're actually going to lose -- they're going to be N squared shares to the code, and you're going lose all about N of them. So you're going to recover from -- you're going to recover from losses, where you lose almost all your qubits. And in the usual quantum error correcting codes that people talk about, of course that would be impossible, because it would mean that you would clone. So the different qubits in the code are not treated on the same footing. There's some internal structure. So we've been playing a lot with these codes and I'd love to discuss them with you. And right now we're actually trying to -- or we've made continuous variable versions of these codes, because we want to convince quantum optics experimentalists to do this. And it's actually easier to do it as a continuous variable code than as a qubit or qutrit code. But, yeah, if you just use the usual parameters, you would conclude that this is kind of impossible. Or you would do it using a recursive construction in which the number of qubits per diamond ends up being -- well, if they're N diamonds, something like N factorial. So it would be just absolutely impossible. But with good codes you can make it reasonably efficient. So that was Part 1. There was no gravity in Part 1, it was just the causal structure of special relativity and how it interfaces in interesting ways with quantum information. So Part 2 is about the connection between information theory, quantum information theory, and holographic spacetime. And I promised you some kind of quantum information theoretic interpretation of the length that can occur, whatever that means. So what is this holographic principle? The idea proposed by Susskind and 't Hooft back in the 90s is that all information in a region of space can be represented as a hologram living on the region's bounding surface. And that's kind of some words, you know, what would that mean? Well, at some level it sounds crazy, right, but if you have a solid with a bunch of particles arranged like this, and each particle had some number of states, say like K states per particle, and clearly the total number of states we have available here is going to grow exponentially with the volume. And so the entropy of this thing is going to be proportional to the volume. And, well, the statement of the holographic principle is that this is actually wrong, that in a volume of space in fact the number of states is only going to grow exponentially with the area. And that sounds crazy, but it's not. And the reason is that the most entropy dense object that can exist in the universe is a black hole. And the entropy of a black hole is proportional not to its volume, but to its area. So if you try to cram a bunch of information into a volume of space, eventually the thing is going to collapse to a black hole. And the number of bits that you've stored there is not going to be proportional to the volume of that region of space, but to the area. Because it's going to be the entropy of the black hole. And of course ->>: The area of the boundary? Patrick Hayden: The area of the boundary, yeah. And so I guess a way of thinking about this is that somehow the universe is not built out of Lego bricks, the universe is built out of shadows. If you think about Plato's cave where you try to infer what's happening, infer reality by only seeing the shadows playing on the wall, in fact all there is is shadows. But that's kind of mumbo jumbo. We can make this -- or Juan Maldacena back in '97 provided us with a concrete realization of this holographic principle. And we don't need to know any detail about this really, but the idea is that the physics of -- the quantum gravity physics of a d+2 dimensional, so I have one time dimension, and d+1 space dimensions, this bulk, right, is equivalent to some physics without gravity living on the boundary of that spacetime. So the physics without gravity turns out it's a quantum field theory with a special symmetry, conformal symmetry; they call it a CFT, conformal field theory. But this actually realizes the holographic principle. Because all of the physics of this bulk, this d+2 dimensional thing, is completely equivalent to physics of the boundary, which has one less spatial dimension. And so -- and there's a dictionary that allows you to go back and forth between questions about the boundary theory and questions about the bulk. And it sounds very kind of abstract. You can just think of the boundary theory as being some material. I know Dave Wecker yesterday was talking about simulating the Hubbard Model. You can think of it being some slightly more exotic version, something like the Hubbard model at a near phase transition. And that's what this material at the boundary looks like. And it turns out that if you want to solve problems about this material, in many cases you can translate it into a quantum gravity question. And maybe the hard problems about your material become relatively easy general relativity problems. They'll look completely different. So this is what's called the AdS/CFT correspondence. Yes? >>: I'd like to offer a quick self experiment. So there is a [indiscernible], there is a surface. And this basically says that the surface describes everything. Patrick Hayden: The surface is ->>: Describe all the information. Patrick Hayden: Yes, okay. >>: But we can do something like X-ray thermography on this volume, and quickly discover that there are some theory structures that are really there, that are important. So how would you actually concile the view of -- Patrick Hayden: So the question is about counting the number of degrees of freedom. So you wouldn't deny that the bulk exists. We live in the bulk ->>: No, I'm not so sure. Patrick Hayden: Yeah. But the point would be, what you thought were independent degrees of freedom, right, that if you tried to align atoms in some rectangular prism, like fill up some volume with atoms, you would have thought that your atoms were independent degrees of freedom. The issue is that when you pack them densely enough, gravity causes -gravity becomes important. And ultimately they collapse to black holes. So if you write down the big list of all possible states of the system, it's much smaller than you thought it was. So that's really the point. It's not that the bulk doesn't exist. Or maybe some people would argue it doesn't, but ->>: Is it similar to a unique solution for [indiscernible] problems? Patrick Hayden: In some ways, yes. Okay, so just a final point about this AdS/CFT correspondent, if we look at one slice of time, then the -- in order to make this precise and make it work, the geometry of what this time slice looks like is that it's a negatively curved space of maximal symmetry. And so this is hyperbolic space. And if you want to measure distances between points, this is Escher's -- one of Escher's renditions of hyperbolic space, what you do if you want to measure the distance between this point and that point is you think about all the different possible ways to get between them and you find the one that contains the fewest fish. So it's the fish-counting metric. And you can see the fish are big in the middle and they get smaller and smaller out toward the boundary. In fact the fish become infinitely small out toward the boundary. So that's how you measure distances in this world. Okay, I might actually have time to do this. We're going to talk about -- this is going to be about information theory. So what kind of natural information theoretic quantities might be lying around? Well, entropy. The information theoretic quantity par excellence. And so we're going to think about being in just the non-exotic boundary theory, this material, and we want to calculate the entropy if we just look at some part of it, how much uncertainty do we have about the state. So, again, I'm not sure about the background of the audience, but if you have a quantum mechanical state and you have some uncertainty about that quantum mechanical state, then the correct way to describe it is with something called the density operator. And the density operator is hermicion, it has Eigen values, and the Eigen values are non-negative in sum to 1. So you can think of them as a probability distribution. And the entropy of that region is nothing but the Shannon entropy of that probability distribution. So from Shannon entropy, I imagine all the computer scientists are familiar with, from Huffman coding and whatnot. And so that's what we're actually talking about. And it's a measure of the uncertainty of that region or its entanglement with the rest of the world. So here we have an interval of the boundary theory, which I'm going to call A, and we want to figure out what's its entropy. Now, a number of people here have worked on simulation or condensed matter theory. And this is a well-defined calculation. There's a great big enormous operator. You want to diagonalize it, find its Eigen values, evaluate this function. And sometimes people can do it in heroic tour du force calculations. But we're going to see what the dictionary of AdS/CFT tells us is the easy way to calculate it. And in this holographic dictionary, the answer is that this entropy is a constant, about 1 over 4 times Newton's constant times the minimal area of some object. And because here -- so my objects are these curves, these gamma As that start and end on my intervals, my interval A on the boundary. So my interval A on the boundary has two end points. So I think about curves that penetrate into the bulk and terminate on those end points. And there are a whole bunch of such curves. I just find the one that has minimal length, which is what is area in this context. And this entropy is nothing other than the length of the minimal curve -- well, the minimal length among all curves. And that's, generally speaking, a much easier calculation. Instead of trying to diagonalize some matrix, which is probably 10 to the 30 by 10 to the 30, all you have to do is a little bit of simple geometry. You know, calculate, write down the geodesic equation, I solve the geodesic equation, and I evaluate the length. And so this is generally much, much easier, and it gives you the right answer. And I mention -- well, I don't know if I did mention. I did mention that the entropy of a black hole is proportional to its area. And this formula generalizes this fact. And it generalizes it in the following way. If instead of talking about an interval in my boundary theory, I just take the whole boundary, right, then you have to ask, well what are the curves that I'm allowed to minimize over? Well, it's all of the curves that sort of wrap around in the interior. And the one in the minimal length is the one that gets hung up on the black hole itself, so its length is nothing other than the length of the black hole in this case, or black hole horizon, which is its area in the appropriate sense. So this was proposed by Reu Takionaki back in 2006, I think, and it's passed many tests. And I can't say it's proved but it's understood why this should be true. Okay. >>: What happens to string theory? This really needs string to reconcile -Patrick Hayden: Oh, excellent question. So I'm not -- I'm actually not an expert on string theory at all, but so Maldacena arrived at this proposal using string theory. So he looked at the same string theoretic system in two different limits and he got two very different looking descriptions, but because they both came from the same underlying string theory, his conclusion was that they should be the same thing. >>: That's one way to prove theories. Patrick Hayden: Yeah. Exactly. But this has actually become an industry. People use AdS/CFT, this correspondence, without knowing any string theory. Like it's become quite a popular thing to think about applications of AdS/CFT in condensed matter. Like trying to understand exotic condensed materials using this correspondence, that may be thinking -- trying to understand your material and in this dual gravitational picture is much simpler. I think Ragu has actually done some work like this. >>: So the argument is to be raised, that the presence of matter causes gravity by virtue of the entropy that's possible because of the large configuration that makes it possible. Patrick Hayden: Can you repeat that? >>: The reason gravity is associated with presence of matter is because matter is the way to build a high entropy via configuration. Patrick Hayden: Okay. Yes. Provisionally. All right. So now, entropy, you know, as a starting point, is already kind of an information theoretic quantity. But again there are a lot of computer scientists in the room. And you probably have a gut feeling for what entropy is. It gives you the optimal compression rate for a sequence, that the entropy of some source tells you the rate at which you can compress samples that are taken from that source, say, independently. And the same thing is true in quantum mechanics. If you know the entropy of some source, that tells you the minimal number of qubits per copy, per sample, that are required to compress this thing without distortion. And so we can say then we have this -an information theoretic interpretation of this geodesic curve. It's the minimal number of qubits required to compress the information in this boundary interval and send it somewhere else. If somebody held this material, you know, in principle, and wanted to take a part of it and send it to a friend, you can ask how many qubits would be required. And it's going to be governed by this entropy quantity. But of course we had other curves out here. We had these other curves that weren't the geodesics, that weren't the minimal ones. And we know that their lengths are generally going to be longer, just like non-optimal compression protocols are going to use more qubits. So the question would be, maybe there's a correspondence between these non-geodesic curves and non-optimal communication protocols. Of course the answer is going to be yes. I should just say another way of thinking about compression in a quantum mechanical context is instead of counting the number of qubits that need to be sent, you could say, well, if Alice is going try to teleport this A part to Bob, how many bell pairs would she need? And it's, again, just governed by the entropy, because she expresses and then she teleports. Okay. Krysta Svore: One minute. Patrick Hayden: Oh, really? Okay. Let's see. Well, I guess I'll just say that -- I'm going to have to go quickly here. A general curve, what you can do if you actually want to calculate the length of a general curve, and this was done by these authors here, is that you could try to approximate it by segments that were geodesics that terminate on the boundary. Because geodesics that terminate on the boundary, we understand what those mean. And this is roughly how do you it. You have a bunch of intervals, you subtract off of some overlaps. And in the limit where the shift between the intervals becomes infinitesimal, you actually reproduce the length of the curve. And you get a formula for the length of a general curve in terms of things that we kind of know already. And but the formula looks like an entropy of something minus an entropy of a part of that something. And again, those of you who know information theory, what is the entropy of something minus the entropy of a part of something? It's a conditional entropy. Right? And conditional entropy, despite its defect, you know, its quirks. In quantum mechanics it can be negative, so a negative uncertainty is a bit of a strange thing. It needs to be more than certain, as was observed by these scientists. It nonetheless actually has an operational interpretation, that if you want to ask the question how hard is it to teleport a bunch of As, you know, As are parts of some quantum systems, to Bob, and Bob already has part of that system, the Bs, then the cost, in terms of bell pairs, is the conditional entropy. And that cost can be either positive or negative. If the cost is positive, of course it means that you actually have to use up some bell pairs. If the cost is negative -- you know, we're at Microsoft where you're familiar with negative costs. Those are called profits. Then you actually earn some entanglement out of this process that you could use in the future. So just fast forwarding a little bit here. But the basic idea is that the length of a general curve in spacetime ends up being the cost for a process where Alice is trying to teleport information corresponding bits of the curve to Bob, but Alice and Bob are constrained in geometrically in where they can act. So like here Alice and Bob at any given time they have to act in one interval. Bob already has part of it in the communication, the cost is going to be a conditional entropy, and then they move along to the next interval. And when you add all of these up then you actually recover the formula for the length of the curve. So I apologize for having raced ahead a little bit there, but that is the story, that the length of at least a convex curve is the minimal entanglement cost for Alice to transfer the boundary state to Bob when Bob is restricted to act locally in some intervals that are determined by the geometry of the bell curve. And that's the information theoretic interpretation of a curve in space. So the conclusion is that bulk geometry and boundary entanglement are intimately connected. And I think that nontrivial results from quantum information theory, like this state merging that I had to blow through a little bit from 2005, can teach us things about the geometry of spacetime that would not have been accessible and understandable without quantum information theory. There is a bit of a movement right now to try to understand, try to see whether we can see the emergence of spacetime from the structure of entanglement and whether -- we don't know the extent to which that will be possible but these are some tantalizing hints that something along those lines should be true. So thanks for your attention. [Applause] Yes? >>: Very elegant. I thought talking radiation solved the problem of conservation of information. Are you saying there's some question about that now? Patrick Hayden: Yes. So I didn't talk much about the information paradox, but this correspondence, like the Maldacena's AdS/CFT, that, back in the late 90s, caused people to essentially think that the information paradox had been solved. The reason being that you could -- in this setup you could describe something that looked like black holes, and that black holes had some evaporation process. But they have this dual picture, this dual picture where there's no gravity and it's just a standard quantum mechanics. And the standard quantum mechanics, we know is unitary, doesn't destroy information. So the reasoning was that black hole evaporation should be unitary. Now, I think most people still believe that to be true. But understanding in detail how information comes out and how the -- how our sort of semi-classical understanding of physics can be consistent with the unitary black hole evaporation is very much -- I think pretty much everyone who has thought about this is just confused at the moment. I mean confused to the point where there's actually disagreement as to what happens, not when you hit the singularity of a black hole. But if you're falling into a large black hole, then as you cross the horizon, that can be a region of arbitrarily small curvature, so you shouldn't expect anything special to happen as you cross the horizon. So we think that if you're free falling across the horizon nothing special should happen. But there is significant minority of people now who would claim that when you hit the horizon, that's the end of spacetime or you burn up or something horrible happens, like a horrific violation of generally relativity. And this is because they can't figure out how the entanglement stitches everything together properly without violating cloning. Krysta Svore: So one more question while Mario sets up. >>: So we learn that when you mention is [indiscernible] quantity and [indiscernible] -Patrick Hayden: Relative entropy? >>: Just [indiscernible]. Patrick Hayden: Okay. Yeah. Oh, yes. >>: And there are entropy notions that are the sum of [indiscernible]. Patrick Hayden: Yes, so I totally glossed over that story. And in order for this correspondence to work really nicely and to get general relativity and sort of smooth geometry in the bulk, you need a lot of degrees of freedom per site in the boundary. And that kind of, I think, can replace the many copies. There's some kind of thermodynamic limit happening in the boundary already. So I think, although we haven't proven this yet, that the one-shot entropies will actually coincide with the Lonoy entropy. But otherwise, there always have to be caveats in what I said as far as many copies, blah, blah, blah. Thanks for the question. Krysta Svore: So now we're going to hear from Mario Szegedy on what condensed matter physicists would rather not talk about. It will maybe change the three negative results. Mario Szegedy: Well, just to be brief. So probably all of you, or some of you know this Barbie Doll that said that math class is tough. And actually Microsoft redesigned it and this technically-advanced Barbie Doll now says computing mean values and ground states of local interaction systems is tough. Actually this Barbie Doll has become an instant success with toddlers. And I explain you why. Because actually what she says is true, that computing mean values and ground states are actually harder than some think. So I want to just talk briefly about theories and results. And my talk is going to be just the opposite of Patrick's talk, because I am just talking about mathematics and not intuition. It's like a Hungarian hobby to get deeply involved into proofs without stating the theorem. And probably is not going to be as interesting or exciting. So the first -- so these three parts are completely disjoined, so if you don't like the first or you don't like any parts you can go to the coffee and come back for the next part. So the first is actually it's not my result, but it's a result of Alistair Sinclair and Piyush Srivastava. It says the mean magnetization in the ferromagnetic ising model is actually #P hard to compute. Now, what my result is that I started to work -- actually it's very exciting research now with Srivastava, and this is just a complete bi-product that we just observed that some of our thoughts actually imply the first result in a very simple way. And so I don't even tell any of the definition except that this ZI is the partition function of the ferromagnetic ising model, and the average magnetization is expressed by taking the derivative of this polynomial, taking the derivative of this polynomial and just creating this expression. So the statement is that computing this is #P hard. So don't worry about #P. #P means it's harder than NP, it's harder than quantum computing. So it's just really hard. So actually there is a multi-variant version of this partition function. And the proof uses in -- maybe in an intriguing way, I mean both -- I almost did this. So both the Alistair Srivastava and the proof uses this multi-variant version. So you don't need to know even what this polynomial is for the proof, just a couple of facts. So the first fact is that ZI -- so the partition function itself is #P hard to compute. And actually the only reason -- the only thing that stops you from proving that sort of this algorithmic derivative is #P hard because there could be a common deviser of the ZI derivative. So when we look at this formula, it somehow simplifies. So the first thing to understand -- so actually the only thing to prove -- to be proven for this theorem is that this does not happen. So what I want to prove is that this, let's say, mystery polynomial and this derivative don't have a common root. Now here comes a famous theorem. And if I don't say anything about my research, well, that's actually the thing. Or if I did not say, then this is a fact that is verified to know, which is the Lee-Yang circle theorem. >>: Sorry, [indiscernible]. Mario Szegedy: Well, it's N particles. >>: Okay. Mario Szegedy: Ising model N particles. It's a finite model, so #P hardness of course classifies it in terms of N. So the Lee-Yang -- and by the way, this N is the number of particles. So the Lee-Yang theorem says to take the partition function of the ferromagnetic ising model and simply [indiscernible] so it can be 0 only in the following ways. That either, like all XI's on the complex unit circle, so this is a complex thing, or while some are under the complex unit circle and inside and some are outside. But it cannot be that all are inside. So that's a very famous fact. And this helps proving that the ferromagnetic ising model does not have trivial phase transition. Now, so that was an important fact, so we need yet another fact. Is that if I am taking this polynomial and the partial derivative at the X1, then on the unit circle -- and actually here I could even take different lambda Is on the unit circle, so this partial derivative is not 0. And there is yet another fact, this is the last one, that actually this mystery polynomial is multi-affine. So what it means is that every XI occurs, so if I am looking at it only as a polynomial of XI, then it's linear and each XI separately. But of course it's not linear or together, so it's called multi-affine. So now let me at least state the theorem so that you know so that the theorem is as I said. What I want to prove is that the ZI and this derivative don't have a common root. So there is no such lambda I such as the roots of both, so that's what we are going to prove. And so the proof is strikingly simple is the remaining of this space and the next page. So assume -- and I am just using those facts. So assume that actually there is a common root of ZI and this derivative and actually every root of the ZI -- oh, so I am using ZI, so overloading the notation, I am using it as a multi-variant polynomial like that and I am also using it as a single variant polynomial like for each XI I just replaced lambda. So here of course we are talking about a single variant polynomials. But so what the Lee-Yang theorem says about this single variant polynomial, verify the place, lambda, lambda, lambda everywhere, and the lambda were larger than 1 or smaller than 1, then we would be either outside the unit circle everywhere or inside the unit circle everywhere, and so this configuration is forbidden for the roots of this multi-variant polynomial. So how do I get the contradiction? So I want to get a contradiction from this assumption. So I'm going to just -- so let's assume that there is such a lambda 0 and so I am looking -- I am replacing lambda 0 into ZI everywhere, and now I am perturbing these values a little bit. And then I am perturbing it in two ways, and it's important in two ways. So first I am just attaching a multiplier 1 minus epsilon everywhere, and the other way is that I am also adding a little perturbation to the first coefficient, and I claim that I can, from these two perturbation, I can mix a 0. So by fixing epsilon and Delta appropriately, I can create a 0 in an illegal place. So if Delta has smaller absolute value than epsilon, then so the first perturbation with epsilon is that I'm going inside with epsilon, and here I am adding a little doubt about smaller than epsilon, so I am always inside the circle. And thus my Lee-Yang theorem is impossible. So that's what I want to show, that I can create this situation with some epsilon and delta. So how do I choose that epsilon and delta, that's the only thing I have to tell you. And here is the -- actually the end of the proof with the [indiscernible] sign. And let me just show you the picture. So, again, I perturbed this way everything and then here I moved a little bit away. Now, what was our assumption? Well, it was that lambda 0 was the root of the polynomial itself and also of the derivative. So that was the common lambda 0 of the root of both. So therefore, itself and the derivative itself disappears, so if I am now looking -- so I'm making the first perturbation, then I'm getting at the point 1 minus epsilon, lambda 0, one minus epsilon minus zero, one minus epsilon lambda 0, one minus epsilon lambda 0, the polynomial, the multi-variant polynomial takes value, which is secondarily small in epsilon. So now I am using the fact that the partial derivative that was one of the facts, fact 4, that the partial derivatives did not appear -- does not disappear at X1. So I can now whatever small quantity -- I mean whatever small value the ZI takes at this point, I can compensate it with delta, because not because of the multi-affine property, the function looks like this, that when I am -- so when I am on the point here and then here it's almost the same as when I am at the point, well, at this junction, there, there, there, except that there is this additional term. But because this is not 0 and this is very small I can set delta sole that is going to be smaller than epsilon, and so this whole thing disappears, as a matter of fact this is how I should set it. And so that was actually a 30 pages paper by Alistair and Piyush, so actually what I should appreciate is exactly the simplicity of this proof and nothing else. But our research hopefully yields some fruits very soon under -- related to the Lee-Yang theorem. So now it's just completely different thing as Part 2. And so let me switch gears here. Okay, so here is the hope. So now this was actually classical although I did not say that. But the -- this was the classical ferromagnetic ising. So now I am talking about quantum. And I am talking about the area law, and Patrick has already talked about it, and some of it I might very briefly repeat. But so the hope is that why does the area law matter in condense matter physics? Because the hope is that if the area law holds, like if I have material and I am looking at the ground state of the Hamiltonian created by the interactions between the particles, then we hope that the area law holds for the ground state. And so we can describe the ground state by cutting the material in two pieces. And since there is small interaction between the pieces, that's what aerial means, we can put together the information and get the entire information on the ground state. So that's the hope. And so our result is that actually the area law does not hold at least the general area law, at least the very general area law for graphs does not hold. So Patrick has talked about it and I just repeat that -- so the area law requires a notion of entanglement entropy. And it's good to have as many measures about entanglement as we can, because that's what we study in quantum. And the first measure is of course if something is entangled or not. So if it stands [indiscernible] then it's not entangled and so if you have a bipartite system, so in the index means or in the label means that it is the first system and 2 is in the second system, and if it looks like this then it is entangled. So now this grandiose measure, this entanglement entropy, which is like we think is the most elegant measure for entanglement is defined as -- so if we trace out, so we have the two parts of the system, and we can either trace out the first part or we can trace out the second, and you choose, which one, but whichever part you choose to trace out, the entanglement entropy is defined as -- so you get a density matrix simply just defining the entropy of the density matrix. So if you do the other you just get the same. And here like an example -- so basically -- so since Patrick explained it better, but I just wanted to explain it this way that so when you trace out Bob and you have like a state [indiscernible], what do you get? Well, you get the probability distribution on states. So what is the amount of the information to describe Alice's state in -- well, you know, Alice, after the measurement, so that's the other explanation for mixed states. The first is with density matrix, but there is another explanation of mixed states is just the probability distribution on states. So if I want to tell you which state, so if all but you know is that it's either this state with probability run over [indiscernible] and so how much information do I have to tell you? Well, it's log of N bits of information. So here could be a bit of confusion between E based log and 2 based log depending on whether you talk with a physicist or with a computer scientist, but I don't want to be a judge here. So actually in one slide I had one and in the other I had the other, probably just to confuse anyone. So this is another way to explain entanglement entropy, that if like we factor out Bob, what is the information I have to tell you about if I want to specify Alice's state. So what actually gives hope to the aerial, and here I am actually -- I am telling you what the area law is in this general area law, which is actually a true statement, and this is mathematically proven, the catch is that only for commuting terms. So if these local Hamiltonians that describe the interaction, they commute with each other, and they always commute. But when they are neighbors we have to say they are commuting. So if they are commuting, if they are local, so we normalize them appropriately, then -- and so if the area law says it's such Hamiltonian, then if it is gapped, meaning that the ground state that the energy difference between the energy of the ground state and the [indiscernible] state is some constant delta, then the entanglement entropy between two parts, like here the black part and the white part, is upper bounded by the size of the cut. And the cut is just a number of edges we cut. We have to cut between the two parts. So here is three. So it's upper bounded here is three times whatever the Eigen value so that's what the area law says. And it's true and in the commuting case it is just simply true. And so our result is that in the known commuting case for general graphs it's not true. And so we know actually that for the general non-commuting case, to match you that is true, and then there is some [indiscernible] improvements, and so our result says that the general is not true -- I mean the non-commuting is not true and even, so the graph that we found is actually is just one edge, so the area law would say that -- and it's Hamiltonian [indiscernible], so the area law would say that the entanglement entropy is upper bounded by 1 or constant, and it's simply not true. So that's the second negative result. And I have some time for the third, I hope. and I will be very quick. Minus two minutes. Krysta Svore: Fifteen minutes. Mario Szegedy: Okay. That's great. So the next we have more time. So I have three more slides. So when I go to, let's say [indiscernible] talk, then I see the following like rosy picture about condense matter physics. Okay. So we cannot compute the ground states for every Hamiltonian we want, but at least we can compute ground states for some small subset of like matter. So the small subsets. So at least if we have like some interaction gravity -- I mean so the local and grid-like and the Hamiltonian is gapped and the interactions are translation variance; it has all these good properties, then we can say something about the ground state. And so now the third negative result says that actually we cannot. So even if you want to compute some mean value, then that's already hard to say. That's already hard to tell. So enter Kolakoski sequence. So Kolakoski sequence is a fun sequence. And when I worked with [indiscernible], then everyone was looking at the Kolakoski sequence, starting from Jeff [indiscernible] and others, of course. And just those who were crazy enough to work on this little stupid thing. So we have a sequence, 1, 2, 2, 1, 1, 2, 1, 2, 2, 1, 2, 2. So what is this sequence? So look at the runs and the lengths of the runs. So the lengths of this run is 1, the length is 2. Two, 1, 1, 2, 1. So what do you see? Well, you see that actually you get the same sequence. So actually this is a unique sequence. Well, if you start with 1, it's a unique sequence with this property. That is the same. And if you start with 2, then there is another unique sequence. But let's assume that you start with 1. So since 1965 it is unknown if the number of 1s in this sequence is roughly equal to the number of 2s. So we create -- so we have a very simple rule and we just simply don't know if the number of 1s is as N goes to infinity then the fraction of the entries that are equal to 1 is 10 to 50 percent. So how does it relate to condense matter? Well, we can create sort of a crystal out of this Kolakoski thing. And that's really my last slide. Although I should have continued with some other examples, because this crystal has one problem, which is that this points, it's not completely translation in variant, because it's just defined on this quarter and this point is different. So if I wanted to go completely translation in variant, I would have to look at [indiscernible] for instance, and things like that. But this is just an ongoing research anyways. So I don't know how far I get. And these are not huge things. But notice that if I am writing down the Kolakoski sequence vertically or horizontally and also vertically, then -- and if I am putting the two things together, then I can create local rules that are local translation in variant rules, that was unique solution is exactly the Kolakoski sequence. So in this material, actually, let's say 1 -- if the magnetization is -- I mean 1 is spin down and 2 is spin up, then in this material, the 1s and 2s equal out exactly if in the Kolakoski sequence that 50 percent holds. So it's a very simple rule, and yet no one knows. So it shows that even such simple problems are hard. So sorry for being so negative. These are my -this is my negative talk. Thank you very much. [Applause] >>: I have one question. [Inaudible] that you mention is computing such a number. And it's basically saying it's computing exact values [indiscernible]. Mario Szegedy: Very good question. Yeah. Very good question. It can be done. So already positive, more positive than I. It can be approximated. >>: [Indiscernible] like is it constant? Or -Mario Szegedy: I think within arbitrary -- I mean within epsilon -- 1 plus factor 1 plus epsilon and you choose epsilon. >>: But it's polynomial in epsilon. Mario Szegedy: That I am not sure when the epsilon gets into the exponent or in the running time or it's a factor in the running time. I mean [indiscernible] or something like that. Any other questions? [Applause] Krysta Svore: Okay, so now we're going to hear about some quantum algorithms. Matt Hastings is going to talk to us about quantum chemistry by quantum simulation and talk about the recent results in the algorithms and estimates for the runtime. Matt Hastings: Thanks. So this is a work that was with a number of people, some of whom are here, Dave Wecker, and Nathan Wiebe, and Matisse Troyer, as well as some other people who are not here right now. And if I've given this talk like half a year ago it would have been quite a pessimistic talk, but now it's a fairly optimistic talk. The question is if we want to simulate quantum chemistry on a quantum computer -- and this has always sort of brought forth like an application of quantum computers -one of the first things you might do, one of Simon's original reasons for being interested, was simulating quantum chemistry, how hard is this? And it's sort of not so hard to see that you did do it or simulate any quantum system, at least if you're talking about simulating time dynamics of the quantum system in polynomial time on a quantum computer. But really the question is what is this polynomial and just how long is it going to take? And we've had a large number of improvements, some of which are algorithmic improvements where it's really sort of a computer science issue where we're doing exactly the same mathematical computation but we're just doing it with a faster algorithm. Some of them are physics inspired, relating to reordering some of the computation to reduce the error. And some of them are more sort of error bounds improving or understanding of the error. And they've all led up to a really large change in how fast we think this problem can be simulated. So why is quantum chemistry worth simulating? The reason quantum chemistry is worth simulating is you can do interesting problems with a small number of logical qubits. And of course the catch is I'm talking about logical qubits, and this might require an enormous number of physical qubits. But you can already do interesting relevant problems with the order of a hundred logical qubits with the number of qubits being required going up as the size of the molecule increases or as the base in size increases. So what is it that we want to do? We want to estimate -- for some molecule we want to estimate the ground state energy, we want to estimate some observables in the molecule, like where the electrons are, they're there to get other properties, like polarizability. And how do we do this? Well, what we have is we have Schrodinger equation describing electrons moving in the continuum. It's written down -- it was originally written down by Schrodinger. There's this grad squared sitting in there, so the wave function of the particle in some position in space, what you have to do is you have to make it into a finite problem. So instead of having the particle being some continuum position in free space, you truncate to finite basis set. And this allows you to represent the problem on a computer. This would either be done on a quantum computer or is currently being done on classical computers. There is standard basis sets that exist. There is a large literature on what good ones are, and larger basis sets will give you a higher level of accuracy. So you truncate to some finite problem and the problem then is to estimate the energy for this basis set. The basis set gives you a set of orbitals in real space, and each of these orbitals can be filled by an up electron or a down electron. So hence, this is [indiscernible] two electrons can fill each orbital. This basis set, typical basis sets come from combinations of Gaussians. And I think I just want to emphasize is that there are classical packages. PyQuante mentioned there is an Open Source package, Gaussian is another one, sigh 4, and there are many classical packages that you can get, some Open Source, some commercially available, that have a well-developed theory for generating these basis sets and then generating the needed interaction terms in this basis set. What do I mean the interaction terms in this basis set? Well, eventually what it will spit out is it will spit out a Hamiltonian for the problem in the following form. You have P and Q labeling -- I'll use the term spin orbital to refer to both, an orbital degree of freedom and a spin degree of freedom. So it will label -- spin up or down as well as some particular basis function. So there will be two types of terms. There's terms A dagger, P, A, Q, with some co-efficients H, P, Q, so you can get the diagonal with some energy term, the off diagonal one with some hopping term. And then some terms H, P, Q, R, S, A dagger, P dagger, Q, A, R, A, S, which is an interaction term. And this comes from the qulong potential. So one thing I should emphasize maybe for the physicists in the audience especially, usually in physics we're used to seeing these interaction terms looking diagonal, like A dagger, P, A, P, A dagger, Q, A, Q. And such terms do exist in our problem, but really here, because of the particular basis sets we work in, we really have all these terms being non-0. You can destroy electrons from one pair of orbitals and create them in another pair of orbitals. And it's a complicated structure of these terms that are present in these quantum chemistry problems. So how do we do this? Well, we take -- the particular representation we've been using, there are other ones which are more compact in terms of qubits. They actually don't gain that much in terms of the qubits required, and then they have a large time overhead. We take two qubits per orbital, and we just have a very naive representation of the state on our quantum computer. We have two qubits per orbital, hence one per spin orbital. So a qubit up means there's a particle in that spin orbital and a qubit down means there's no particle in that spin orbital. So it's a very straightforward transcription from the states of the molecule where the electrons are sitting in these orbitals, to the state on the quantum computer. We then prepare the system in a simple state, which the simple meaning just something we can easily prepare, which has a reasonable overlap with the ground state. We'd like it to be as good as possible. There are product states, these Hartree-Fock states, which I'll say a little bit more about later, which have reasonably good overlap for the sizes we're interested in. Then what you do is you use quantum phase estimation. Quantum phase estimation is an algorithm, which given an operator, like this H, this Hamiltonian H, allows you to measure the expectation by that operator to project into Eigen states of that operator and determine what that is. You repeat this many times. So the state you initially prepared did not have perfect overlap with the ground state. So the first time you measure, maybe you will project on to the ground state, you'll get the ground state energy; maybe the next time you'll get a different energy because you weren't perfectly overlapping with the ground state, you had some overlap with a different state. So a sequence of energies out, record the lowest ones seen, and that's the estimate of the ground state energy. In this talk I'm going to ignore all the issues that would occur if you can't prepare a state with high overlap. These issues are research problem for the future. They'll get worse as the molecule gets bigger. But currently for the molecule sizes, we've been doing a lot of simulation. And for the molecule sizes we can simulate on a classical computer, this seems really not to be an issue, and we expect it not to be a real issue for the early sizes we would be able to simulate on a quantum computer. So how does this quantum phase estimation work? In quantum phase estimation, what is the goal? The goal is, the way it works is you have this Hamiltonian H, which is a sum of terms that I'll write as H sub K. Each of those terms, H sub K, is one of these terms in this expression here. So each of these terms in this sum are given a choice of P and Q or given a choice of PQRNS, I'm just representing it schematically as this H sub K. So the sum of these H sub K is the Hamiltonian H and you have unitary E to the iHkt, which describes the evolution under this Hamiltonian for a certain period of time. What you would like to do is you would like to implement a controlled unitary. You would like to have a certain extra qubit called a phase estimation ancilla, and depending upon that extra ancilla, this determines whether or not you apply this unitary here. And then you essentially do an interference experiment. You do an interference between applying this unitary and not applying this unitary. And the result then of doing this interference effect, you take your ancilla, and you prepare it in a super position of up plus down. You then do this controlled phase. And what you find is that the case where you apply the unitary picks up an overall phase, which is the E to the IHT or E to the IET, the energy of the state. So you pick up a phase which depends upon the energy of the state and then you measure that ancilla. So you're essentially interfering two different trajectories, and in that way measuring the expectation by the Hamiltonian. So in all the plots you'll see later when I start showing circuits, you'll see this extra phase estimation ancilla. That's just to control this unitarian, determine whether or not we apply the Hamiltonian. So we now face an issue, though, of constructing this controlled unitary, which is E to the I times the sum of the terms. We don't know how to do that exactly. There's a large literature on different ways of doing it, to different levels of approximations. If all the terms commuted, what we could do is we can do E to the IH1 times T, then E to the IH2 times T, and so on. And we do have circuits that will allow us to implement E to the IH sub K times T for all those terms that I've written down. And I'll show you those circuits. However, the terms don't commute. So this will not be correct. But a simple approach is the so-called Trotter-Suzuki approach. And it's based on the following formula. If you want to do E to the A + B where A and B are matrices that don't commute, you can write it as E to the A over N times E to the B over N to the Nth power. And the error in this expression gets smaller as N goes to infinity. This here is roughly E to the A over N plus B over N in the exponent up to a construction that's 1 over N squared, and then overall the error becomes 1 over N. You can improve a little bit on this with higher order formulas, and there are even higher-order formulas. So basically what we're going to do, instead of -- we wanted to go for some long time to do this phase estimation. And the time that we need to do, sitting up here, the time that we need to do in the time estimation, that time T, is roughly 1 over the energy accuracy that we want to get in the end. We're trying to resolve very small differences in the energy, so in order to do that we need to go for a very long time. And rather than going to that long time in a large step, what we do is we first go a little bit under the first term in the Hamiltonian and a little bit under the next term in the Hamiltonian, a little bit under the next term. So we go a little bit under each term and so on. And that's essentially how the Trotter-Suzuki expansion works. So now let me give circuits to do each of these terms. I said we could do E to the I sub K for each term. So for example, the simplest term you would have is A dagger PAP. And this just is a number operator. It's either 1 or 0 depending upon whether there's a qubit, whether there's a particle in the P spin orbital, which is to say it just picks up a phase depending upon whether or not that qubit is up or down. So this one is really quite simple. What you do with this P, remember, this P will be either up or down depending upon whether a particle occupies that spin orbital. So this phase is this phase estimation ancilla. So depending upon the value of this phase estimation ancilla, you either do or do not apply a rotation to that qubit. Because in this case, ignore the phase estimation ancilla for a second, E to the IHP A dagger AP times T is just a Z rotation. You're just changing the phase of the upstate relative to the down state, so you're just applying a rotation about the Z axis. So this is just a controlled rotation about the Z axis in this case. So that's what you want to apply to do this operator. If you want to do a term like HPQ A dagger PAQ, it's a little bit more complicated. It's a hopping term. It's represented by a circuit like this. By the way if there's any questions about this one, the circuits are going to get progressively more complicated. So if you're a little bit wondering about one circuit, please ask me, because they'll just get more and more involved. So what do we need to do to do A dagger PAQ? I want you for a second to ignore these lines sitting in the middle right here, these two lines sitting in the middle right here. What this A dagger PAQ does is it can remove a particle from one of the states and then create a particle on the other state. Or visa-versa, remove it from this one and create it in this one. So we need something that will be off diagonal. It will change, you know, up/down to down/up like that. That's what these basis change gates do. H here refers to a Hadamard gate. So it produces a Hadamard on each of these that changes them, interchanges X and Z basis. Then there is some CNOT. And as I said, don't worry too much about the stuff in between for a second. There's a CNOT from P to Q. And the effect of this CNOT from P to Q is that now Q carries basically the original value of X on P and X on Q. The Z on Q is now X on P times X on Q. How did that happen? Well, this Hadamard turned Z on P to X on P and Z on Q to X on P, and then the CNOT basically added the two of them. Then we apply a controlled rotation and we undo it. And so the effect of this is to do an E to the IX on P times X on Q, and that's something that has this correct off-diagonal property of removing from one and creating from the other. That's the two X's. We also do the same thing in a Y basis, which gives us an E to the I, Y on P, Y on Q. The reason for this is if we just did this first term we would correctly get the term where the particle hops from here to here. We also get a term where we go from no particles to two particles, and combining these two cancels it out. Important point. The important point, these things I've been telling you to ignore in the middle, what are these for? There's a sign that's supposed to enter in. So these particles are electrons. And so when you move it from P to Q, there's an overall sign that should come in, a fermionic sign that should come in. Because interchangeable electrons picks up a minus sign, and when we rewrite them in this basis of spins, we have to get those signs out correctly. And the correct way to do that is pick any arbitrary ordering of the spin orbitals, and having picked this arbitrary ordering of the spin orbitals, you put in an overall minus sign depending upon the parity of the orbitals in between. So instead of going from here to here and visa-versa, you look at all the electrons in between, all the orbitals in between, you count the parity, you count the total number in there, and you count whether it's even or odd; and if it's odd, rather than doing E to the I theta, you do E to the minus I theta. So the effect of these things here is we basically take this one, add it to this one, add it to this one, add it to this one, and in the end we succeed in counting the parity, we add it up and count the parity. So that's how this hopping term works. The other terms, the PQRS term, HPQRS A dagger P A dagger QARAS is even more complicated. This involves -- I'll talk about this one in one second. It involves four different choices of orbitals, but it's essentially a more complicated version than the previous one. There's various basis change gates, Hadamard gates, there's a more complicated fermionic string here. To get the signs one, 1 from P to Q, 1 from R to S, so you sort of look from P to Q, count that parity, look from R to S, count that parity. Again, there's a phase estimation ancilla, there's lot of controlled rotation gates. So this is an incredibly complicated circuit, and that's how you implement that one. And when I say that's how you do it, I'm just quoting the standard circuits that you could look up in the literature at the time we did our first paper. And similar circuits are done for this PRRQ term, which is -- again, involves four operators, but two of them are the same. It's like a controlled hopping, depending upon whether a particle is on R, then you can hop. So what's the problem with this approach? Well, one problem with this approach is total number of terms grows roughly as N to the 4. If N is the number of spin orbitals. Because a large fraction of these HPQRS are non-0. So you have a huge number of turns in the Hamiltonian. Each term requires enforcing the fermionic parity. These Jordan-Wigner strings, these Jordan-Wigner strings are simply these strings of CNOT gates to get the parity right, and that's proportional to N in general. You know, typically the two things will be a distance N apart. So that's a factor of N. Then you might ask what's the Trotter step required? How small do we have to take this -- you know, I said we go a little bit on the first term, little bit on the second term, and so on. How small do we have to take it? According to the most naive bounds -- I say most naive, but they were the ones in the literature that you would quote -- this would be, again, actually the three-halves power the number of terms giving us an N to the 6. And if you add it all up we get an N to the 11th time, which is a rather big scaling. So we're going to improve it a lot. So this relies a lot on both analytic work and also numerical work. Here's some of the molecules simulated with liquid, which Dave Wecker did a demo and talk about yesterday. It's really been invaluable both in allowing us to do simulation of this algorithm for small molecules, which we can simulate exactly and gain a much better understanding of the error effects. And also in understanding how to change the circuits. Because liquid has allowed -- really you can make a change in how you do some of those circuits I've just shown, and very quickly see how is the gate count changed. And it's sort of funny that some things that are like seemingly trivial changes, for example, how do you order your spin orbitals? Do you order it from orbital 1 up, orbital 2 up, orbital 3 up, orbital 4 up, and then orbital 1 down, orbital 2 down, orbital 3 down, so on, 1 up, 1 down, 2 up, 2 down, and so on. That's actually a fairly large constant, but fairly large constant factors speed up doing sort of seemingly trivial things like that. And somewhat surprisingly, most of the seemingly trivial ones were done in exactly the wrong way in the literature beforehand. So any way, I won't go through this table. This is a table from our first paper. The important thing is that even with rather optimistic estimates of how fast it would take to execute a gate, if you applied these upper bounds for the circuits and so on, you got millennia to solve the problem. You got enormously long times. So I won't focus on the improvements to the problem. So I'm going to start by talking about the first improvement. I have several different improvements, so I'm going to go on a couple different topics. The first one is getting rid of those Jordan-Wigner strings, which this is going to be one factor of N removed. So we're going to mostly focus on these two bodied, the ones involving four fermions. Those are the ones with the most terms. The two body -the one-body, two-fermi terms. Also you can do the same improvement trick. It's not a series, but I'm just going to show the plots. For these ones, the circuits for these ones, just because that's where most of the terms are. So first thing to note is we can -- this is the traditional circuit. We can rewrite the circuit in a certain way so that all these Jordan-Wigner strings appear outside the basis change gates. That is, rather than executing the Hadamards and then doing all this, you can interchange them and replace this CNOT with a controlled Z and it gives you mathematically the same structure. This might not seem to be too much I've just interchanged from this one to this one and the total number of gates is the same. However, the advantage is if you go back to what I had to do previously, I had to do this whole mess. First the Hs then the Y, then the Y. There's a whole bunch of different things. H, H, H, H, H, H, Y, Y, Y, Y, H, H, Y, Y, and so on. Now, by moving the -- and on each of those, on each of those, we had that whole string. So we did the basis change then we did the string, then we undid the string, then we undid the basis change, and we did another basis change, then we did a string, and so on. By moving the strings outside, we only need to do them once. We do the string, we do a basis change, this thing, then in here sort of dot, dot, dot, repeat the circuit with those Hs replaced by Ys and so on. And then those Jordan-Wigner strings just sit outside. So that's a constant factor improvement, but it's a pretty big one. And then what we can do, though, the crucial thing is that if we lexicographically order the order in which we do these strings, so we do a given PQRS, and then we do PQHS + 1, and then we keep increasing S until we reach the maximum possible value of S and then we increase R by 1 and so on. If we do this in lexicographic order, there's a lot of cancellations possible. So I've just drawn one of those circuits from the previous slide. I just drew a single basis change inside here. Then here comes the string coming out, then the next string coming in, then the same thing. But if you look at this, you stare at this, most of this can be canceled. This is undoing the string from 1 and then redoing it for the next one. And what you see is you have a CNOT from this on to this and a CNOT from this on to this and a CNOT squared as 1, so I can drop that. And the CNOT squared here is 1, so I can drop that. In fact I can drop all of this stuff right here, except for the little bit at the very end. So I've removed a large portion of the Jordan-Wigner string by doing this. One way to think intuitively about what that is is actually a very simple interpretation of that. I mean it seems like a mathematical trick, but intuitively what it means is I need to count the parity of these sites in between. After I count the parity in between, you know, add it all up, 1, 0, 0, 1, 1, so on, when I go to the next term in the sequence, I don't need to recompute the parity of most of it, I just need to see how the parity changed. So actually really need to just do one extra CNOT. You can improve this a little bit more if you add an extra ancilla that keeps track of the fermionic parity, so that rather than keeping track of the fermionic parity by doing a running sum in here, you pass it to the ancilla and come back. This allows even more reordering and more cancellations possible. Another advantage of the ancilla is it allows us to do a trick called nesting. The effect of the ancilla is that you might have one term that acts on a particular set. For example, this set of qubits and this set of qubits. These would be the four acted on, and we've just drawn it several times to indicate, you know, that they'll be the H, H, H, Y, Y, Y, Y, and so on. But you might have another term that acts on these two and these two. This one moves the qubit from here to here, but does not change the parity of this string -- sorry, moves an electron from here to here, but does not change the parity of this string. This one moves from here to here but does not change the parity of this string. So we actually can nest in such a way that we can execute them at the same time. We can improve our parallelism. That is already -- liquid was looking for certain kinds of parallelism, it would realize that if two gates acted on different qubits you could execute them at the same time. You could just push everything to the left as far as it could go until it hit some other gate acting on that qubit. But here we take advantage of the fact that even though the gates in this circuit don't commute individually with the gates in this circuit, there are some ways in which they commute as a whole that allows us to execute them simultaneously. This actually gives another factor event in terms of reducing the depth, reducing the parallel depth. So here's a figure which is sort of almost deliberately meant to be hard to read. This is the -- this is an example of an HPQRS circuit for -sorry, it's a circuit that does three different HPQRS's. One for particular choice of PQRS, another particular choice, another particular choice, and so on. It just shows the circuit parallel depth is 151. It's not completely obvious from the slide because you go, wait, that one and this one are not aligned above each other, but when that depth is computed, the 151, everything is slid as far as it could go to the left. It's clearly a complicated circuit, but it's sort of complicated in the sense of the same structure repeated many times and different sets of the strings. After we do the nesting and the cancellation, the parallel depth gets reduced, in this case only by a factor of 3. This is just a few of these circuits here. And this circuit is sort of some crazily complicated thing that you would never have been able to come up with by hand. This multiple controlled rotation being executed at the same time, you know, many of the gates are being executed at the same time, enormous numbers of the CNOT are canceled. A lot of these CNOTs at the start would have been canceled if there was stuff done before; they would have been canceled against the circuit being done before and so on. So this leads to a large improvement. And this is, again, the kind of thing that you can really only do if you have, essentially, liquid acting like an optimizing compiler in this sense. We basically told it the rules as to how it could do these manipulations, and then it was able to find these reductions. There's still probably more work to do, because we're not currently doing optimal ordering of the terms for all these reductions, and we expect we can further improve. So we got a sequence of improvements in the gate depth. These are showing various numbers of spin orbitals for the thing, various -- each point is some particular molecule. Sometimes you'll see the same molecule twice, like CO2 large, CO2 medium, different bases for the same molecule. And by doing all of these, we get some significant reductions in the number of gates required. And that's just the first part. So that's this first sort. Second part, I want to tell you about the second kind of gain we got was what we call interleaving. Interleaving is a way of reordering of the Trotter-Suzuki expansion in order to reduce the simulation error. We don't really have -- we have a strong physics understanding of why it's true; we have good numerical evidence that this greatly reduces simulation error. We do not have mathematical proof as to exactly how this reduces the simulation error. So what kind of terms do we have? Well, we have these terms, HPP in the problem. These are diagonal terms that are just the number terms. And PRRP in our notation, that's A dagger P, A dagger R, ARAP. So it's just a diagonal interaction between two sites number/number interaction. These terms commute with each other. These terms completely commute with each other. And these HPP, these are actually the largest terms in the problem, in terms of their scale. Then you have a bunch of terms HPQ. These are the hopping terms. And then you have these controlled hopping terms, HPRRQ. And then I'll get to the PQRS in a second. These PRQR controlled hopping from P to Q. And in a particular basis we work in, you have a lot of flexibility as to what single particle basis you can work in. But these packages will give you a basis in which the following identity holds. And this is the Hartree-Fock state. HPQ plus the sum over all the occupied Rs, the occupied Rs in this initial product state that we were trying. PHPRQ is equal to 0. So what that says is in this particular product state, you know, you have some of them occupied and some empty at the start. There's an ability to hop from one to another. But then because of the control of the other occupied sites, that exactly cancels that tendency to hop. So when this Hartree-Fock state, there's no -Hamiltonian doesn't create any single particle excitations. The Hamiltonian acting on it, you'll never go from this state where these are all occupied and these are all empty and hop one over. You can create two empty ones and two occupied ones, but you never create just one excitation acting on it what the Hamiltonian wants. So this is a particular cancellation, and they picked this because this gives a very good starting point for a lot of calculations. And these terms all commute with each other. And one of the effects is we can group the terms with each other. What we do is we first execute all the HPPs and PRRPs, those are all diagonal. So first we do all those terms. They all commute, we can do them in any order with respect to each other. Then for each PQ, for HPQ, we do all HPQ and HPRRQ. Those all commute with each other for a given PQ, so we can do those with any order in there. And they tend to cancel out on average. Previously we had done them in different orders. We had not done this necessarily right next to this. And so we're getting a lot of extra error because we had terms that on average tended to cancel out. But we do one and undo it, and depending on what they commute with in between, get a much bigger error. So by doing this grouping, doing these then for each PQ doing these, and then finally doing HPQRS, we reduce the error by a large amount. What does it mean reduce the error by a large amount? Here's the error. This is the error just relative to the exact ground state and the estimate of the ground state energy. This is a function of Trotter number, which is the inverse of the time step in this case. And this is for -- simulated for water. This is the standard lexicographic, meaning order everybody lexicographically. This is how the error goes down with the Trotter number. And this is what happens -- ignore the diagonal fix. It's an interesting thing but I'm not going to talk about it in this talk, even though it will appear on the next slide. I don't think I'll have time. But this is this large dropoff when you go from this spot to this spot. The error enormously drops and it gives you an over ten-fold reduction in the Trotter number required in order to get the same accuracy. Why is this important to reduce the Trotter number required? You have to get to a certain total time. You have to get to the time that's one over your desired energy accuracy at the end to do the phase estimation. Obviously if you can do it in bigger jumps, you get there with fewer gate steps, whereas if each gate step is only a small amount, it takes you more, more gates to get there. So there's a particular kind of fun thing that's like a renormalization group improvement to it, which I'm going to skip the details of. But one thing I do want to mention, which will be useful for the next slide, is that we define sort of an effective diagonal technique, which will be useful in a second. This effective diagonal energy. This effective diagonal energy is in the same spirit of Hartree-Fock. You can say you might think that there's this HPPs that cost to put an electron on a site or take on electron out of a site, but roughly you should also add on its interaction with all the other sites that will tend to be occupied. And on average that will be something like this. So this omega P is kind of a guess as to what the binding energy of some particular orbital is on average, what the guess is to how the other sites is occupied. And this will be useful for the thing I'm about to talk about now. The thing I'm about to talk about now is a multi-resolution Trotter formula. What this means is that there's perhaps no need that every term needs to be executed with the same time step. You could execute some terms more frequently and some terms less frequently. So instead of every Trotter step doing a little bit of each term, you take the big terms and you do a little bit of them, a little bit of them, a little bit of them, and you have to do them a little bit at a time because they're so large. But then when you hit a small term you just do it all at once. You do it to quite a large amount. And then again keep going like that. So that way a lot of terms -- and there's huge numbers of small terms. So a lot of the terms you could do much less frequently and this would lead to, again, a large speed up. So our original idea was that we would exactly just use the magnitude of the term as the factor. And it turns out that in practice this did not work. All kinds of things were tried, and it just in practice did not lead to any improvement. You could try, okay, what we called the coalescing value, you know, coalesce a term by a certain amount, the small ones will coalesce more. And we never got to a situation where reduce the work without increasing the error. And largely this was due to thy's incredible persistence in trying to keep looking at different ways of trying it. And eventually we came up with something that works and that makes a lot of theoretical sense, although, again, sort of makes sense from a physics point of view, but no real math as to exactly why it works. Instead of ordering of the terms just by their actual magnitude, we ordered it by the magnitude squared divided by some energy denominator, sort of in the spirit of second order perturbation theory. And the energy denominator was obtained from the difference of these omegas. So by doing that, this gives us another meaning of the importance of the term. And I'm just going to call this importance, this quantity here. And we sorted in this way. We sorted by this importance. And then the terms of high importance get executed very frequently; the terms of very low importance get executed less frequently. We obtained at least a ten-fold reduction in gate depth. When I say at least, this is a little bit tricky to define, meaning that we came up with some rules that work for every molecule we could test. We don't have a quantum computer, so we are limited in the size we can test; but we came up with rules as to how much to coalesce based upon the importance and based upon the distribution for that molecule, how much we could coalesce for a given molecule. And those rules were giving a ten-fold gate reduction at the sizes we could simulate. We believe that we could simulate -- coalesce even more aggressively at bigger sizes. We just -- it seems to make sense, it's very believable; we don't have the ability to really check it by simulations. Perhaps one of the things we would do if we had a quantum computer, we might run a 50 qubit molecule, verify that this thing really worked at this size, and then trust that it worked at 100 qubits or something. But I expected that 10 is actually a rather conservative statement, that it's probably going to be a lot more than that. So the full set of rules, for example, that we used for hydrogen chloride, we have some terms of very large importance here. And then these terms here, these B, there's a few terms in here which it's a little subtle how you handle. So these terms get done every time. Then you have terms that you do every 16 times, 32 times, and 64 times, as they get less and less important. And so again, another thing we can potentially do is we can start coalescing more and more by not just 64 times but 128 times, and so on. And the important thing to see, here I have a plot of these distributions of this importance for a variety of different molecules. And one of the interesting things is that as the molecule gets bigger, like as you get up to this Fe2S2, what happens is that there start to be sort of a very few terms, way off on the right, of very high importance. So those are the ones that you have to do every time. But then everything else is much, much less. So you would expect that you can get an even greater coalescing gain. So this is something that we need to understand more in detail. But our current simulations are showing -- so here's this plot showing reduction in work for a variety of molecules, according to these rules. And I expect that it's going to continue to increase beyond there. Again, skipping a little bit quickly, because this is a talk I gave in Santa Barbara I gave in an hour and I'm giving in 40 minutes here. So I'm skipping over parts of some of the slides. So what are the improvements we've gotten to? Well, we started, as I mentioned, with this power of N to the 11th. And then I say there's also a factor of 10 to the 4 for phase estimation. The question is sort of there's a cost of a single step of Trotter, and then there's a question of how many Trotter steps you have to do, what's the total time you get to? Are you getting to a total time 1 or are you getting to a much bigger total time? So that gives you some constant overhead if you're aiming for a constant target accuracy. The improvements were canceling Jordan-Wigner strings, nesting, reordering some stuff, interleaving, and then this coalescing trick. And the bottom line is a asymptotic improvement by N squared, lots of other improvements that we don't -we can't give in N squared. Like I don't know if coalescing is N, N squared, or what, but seems to be large. And the final scaling is going to depend a lot upon what's the Trotter number required. So this is the last thing I want to talk about and is very important in understanding the overall scaling. Those bounds showed that the Trotter number -- those bounds showed were very pessimistic, I mentioned, this N to the 6, and they had a Trotter number acquired increasing rapidly as the molecule size went up. However, simulations on a variety of different molecules do not show the error going up in any way as the Trotter number -- as the molecule size increases. These plots of errors is a function of Trotter number for a variety of different molecules. There is a lot of structure in there and detailed stuff, but there's no clear trend with the number of orbitals. And that includes also simulations on smaller ones, there's no clear trend. So I want to talk a little bit about better bounds on errors. Previous work, as I mentioned, just use like the number of terms and so on. We have a better bound, which expresses the error directly in terms of sums of norms of commutators. So commutator of this term with the commutator of two terms with each other here, so like HPQRS, with HABCD, with HEFGH. We can just directly evaluate this bound and see what we get. This relies on really a specific bound. It's not just good to -actually, I think I have this all in here. There's a bound that's not just good to lowest order; that is, there's some errors. There's one way of doing the error now, which I'll show in a slide or two, which is based on expanding the error out to order DT squared. But this is actually a bound that's valid not just as low-ordered expansion, but holds to all orders. And you can just go directly, try to evaluate what these norms are. One of the things you find then is that because it's in terms of norms of these commutators, if this involves four different orbitals -- you know, four different choices of orbitals, so there's sort of N to the 4 when you count this, in order for it not to commute with this term, one of these has to overlap with one of these. And so there's only actually N to the 3 possible choices of this term. And again N to the 3 possible choices of that term. So that's one of the improvements that many of the terms commute with each other. This gives one of the improvements in there that when you take into account this is where the N to the 10 is coming from, and then N to the 5th. That is, there's actually only N to the 4 choices for the first term, N to the 3 for the second, N to the 3 for the next, gives a total of N to the 10 possible choices, and this gives you a Trotter step scaling as N to the 5th. Trotter number scaling of N to the 5th. However, another important improvement is that in fact as these molecules gets bigger, the number of these terms gets larger. But the number of big terms does not get really larger. What happens is you get a lot more terms but a lot of them are small. So for all of those small terms, actually they don't contribute that much to the error. So this gives you another large improvement because when you actually go plug these numbers in, either in the previous, all orders bound I showed, or in this bound here, which is a lowest order expansion, they actually roughly agree up to constant factors. This is a much better constant factor, because it's just the lowest order one. The other one is all order, so it's actually worse than constant factors, but they give basically the same estimate of error. When you plug the numbers in you find that many of the terms in the commutator are actually very small, just because that particular term in the Hamiltonian happens to be very small. There's some further averaging effects -- and I'm down to three minutes, so I won't go over that -which even further improve it, which is that you have all these errors terms as an upward bound. That's very pessimistic. We would expect in reality that they're going to kind of average out. So what I want to -- so this leads to many further improvements. And our guess is the Trotter number is going to be around either N to the 1st or N to the 2nd at most. Really quite slowly going. So I want to conclude with a guess of how many -what would be the time required to simulate a molecule like Fe2S2. I mentioned this molecule. It's a basis of an interesting size, a size that's not a ridiculous number of logical qubits. But outside the range of what a classical algorithm can do. Certainly outside the range that any exact classical algorithm, based just on diagonalizing Hamiltonian or length choose methods, would ever be able to do, and it's been used by IARPA as a test case. And this is going to be done a little bit in the spirit of a fermion problem like counting how many piano tuners there are in Chicago. I'm not going to get the exact number, but it's just -- this is just some guesses. Okay, so what do we have? If you go back before, about 10 to the 7 is the gate depth per Trotter step on this. So it's actually currently like two times that or something, but I think we can probably reduce that a lot by better term ordering. If we say we want milliHartree accuracy -- Hartree is about 27 electron volts. If we say we want a milliHartree accuracy, we think we need a Trotter number of about 10. That's based on the scaling bounds, the Trotter numbers that suffice for smaller molecules and so on. This maybe is more than 10. It's certainly not more than 20, I would be quite confident. Or that is to say, a time step of about a 10th of a Hartree, a 10th of an inverse Hartree. And then what's the total time we need? We need a total time of about 1,000 inverse Hartree to get to milliHartree accuracy. Coalesce by at least the factor of 10, probably more. So a bunch of these are, you know, you might say, well, you're not really 10 to the 7 right now; you're thinking it's a 7, but you're 2 times 10 to the 7. But I'm sure we're going to coalesce by much more than a factor of 10. Here's the extremely optimistic number. We'll assume that the gates take a logical time of 10 nanoseconds. If you want to plug in a different number, just multiply my final answer by whatever you plug in there. And what you find out is that it would be roughly 100 seconds. If you want a micro Hartree accuracy, that would be much lower. But this is actually a very interesting -- very short timescale. You know, even if it were quite a bit larger than that, that saying that you can do, you know, even if it were not minutes, which is 100 seconds, but hours or days, you're able to do very interesting and very useful things you would not be able to do in any other way. So I just want to conclude -- you can read the conclusion, but I want to conclude with one interesting estimate. That 100 second timescale, if you take that energy accuracy and convert that energy to frequency with Planck's constant, and then convert that to a time, you get asked how much slower are we than nature, and you find that we are about 10 to the 12 times slower than -- sorry, 10 to the -- yeah, 10 to the 14 times slower than nature, sorry, 10 to the 14 times slower. That would be -and I don't know if that's a lot or a little. It's sort of interesting just to think about. >>: The more you can go. [laughter] Matt Hastings: Okay. Thank you. [Applause] >>: So you discovered a lot of the problems specific for optimization that you did. Can you infer from this that some of these optimizations might also be useful for other classes of problems? So in the world of designing quantum circuits or quantum algorithms we could have problem optimizations that are not problem specific, problems that are optimizations that are problem specific, and maybe some special classes of optimizations? Matt Hastings: Yeah, that's a great question. Actually I think I really sort of did take that lesson from -- I think we all took that lesson from it. It's great. Being able to actually test this stuff, then you can see how much sort of [indiscernible] on minor improvement does nothing, then something as theoretically great leads to essentially no improvement. You know, it's really fun to think about, but doesn't lead to much at all. But some other ones lead to large improvements, and then being able to check it out. I think there is a large amount to be done there and I think, you know, we need to maybe start taking more of that approach and using simulation to do that. >>: One other question. So you know in some scenarios basically [indiscernible] right, so actually we can [indiscernible]. Matt Hastings: Yeah, no, that's a good question. The CNOT string does not reduce the total number of single qubit rotations required. And this certainly might be the most costly. Depending upon your platform and other platforms like the Fibonacci, the CNOTs themself will also be kind of hard to do. But certainly many platforms the rotations are the costliest part. However, the CNOT reduction in that is what allowed us to do the nesting. And that will lead to a reduction in parallel depth for the single qubit rotation, which would still likely be useful on many platforms. So that's reduction there. And then otherwise all the other improvements having to do with term ordering and coalescing will be those directly reduce the number of single qubit rotation by the same amount. So that would help that too. >>: So the empirical reduction in the error for given number of Trotter steps, were you able to fully explain using the shortcuts in the improvements or is there some kind of [indiscernible]? Matt Hastings: The empirical reduction in Trotter, were we able to explain ->>: The simulator, the math does back up exactly what we saw. >>: So he's wondering if there's maybe more head room that isn't explained yet theoretically that we saw in the simulator. We're not big enough in the molecules to really get any closer. I think we're about as close as we can get. Matt Hastings: Yeah, I mean currently those bounds, the upper bounds, will give numbers that are significantly higher than the actual error. So, yes, there are some issues which are like if the bound tells you it scales this way and you're down here, it's possible that it's going to come up and then hit it and start going that way, so you might wonder. But actually the reality is that those bounds scaled in a certain way are empirical stuff is not only lower, but scaling better. Yes, I think there is more probably to understand, but we're certainly, you know -Krysta Svore: Great. Let's thank Matt again. [applause]