Krysta Svore: So welcome back. Now we're... role of quantum information on space and time.

advertisement
Krysta Svore: So welcome back. Now we're going to hear from Patrick Hayden on the
role of quantum information on space and time.
Patrick Hayden: All right. Well, first I'd justlike to thank Krista and Microsoft Research
for the invitation to be here and attend the summit. The past couple of days have been really
interesting. And I've really enjoyed getting to meet some of the people in the quantum
architecture's group. Like it's a very different and encouraging approach, I think, to quantum
computation. Or building a quantum computer that I'm used to. I'm used to talking to a lot of
theoretical computer scientists and theoretical physicists. And to talk to people who have
actual, honest, good skills making computers work, I think this is obviously a crucial part of the
enterprise of building a real functioning quantum computer. So I really like what's happening
here at Microsoft. The other preface is that the topic of my talk, you may have noticed, is a little
bit different than what was circulated the first time in the notes that some of you have. I was
going to speak about a topic that was properly quantum computation, and when I arrived here
Sunday evening, a Microsoft researcher, who shall remain nameless, expressed dismay that I
decided not to speak about a topic related to quantum gravity. So I felt sheepish. So I'm going
tell you a little bit about the intersection between quantum information and the physics of
space and time and the quantum mechanical physics of space and time. This is clearly quite
distant from what we were discussing this morning, but I think it's one of the virtues of
quantum information as a field that it really spans what will hopefully become practical,
all the way to the truly fundamental. And the same people can actually and often do work in
sort of wearing both hats, depending how they feel when they wake up in the morning. So for
those of you who don't work -- and I think most of you don't work in quantum gravity or such
areas, you can get a bit of a flavor for how quantum information can inform the way we think
about some questions -- you know, interesting questions in fundamental physics. And so there
are going to be -- I'm basically going to try to illustrate this claim that quantum information is
useful, or can teach us interesting things about the structure of space time through two
examples. And the first one is essentially I'm going to give you a complete characterization of
all the ways that quantum information, you know, qubit-type stuff, can be distributed in space
and time. And the answer is going to turn out to be beautiful and simple. And the second half
is going to be that I'm going to provide you with an information theoretic interpretation of the
length of a curve in space. Now, that may not mean anything at all to you at this point, but
hopefully by the end of the talk what it would mean to have an information theoretic
interpretation of the length of a curve will -- at least hopefully you'll get that much out of it.
So let's start with some quantum information bedrock, something that I think everyone in the
room should be able to agree on and be familiar with. And that is that cloning of quantum
information is impossible. So if you had the quantum system in an unknown quantum state,
which we call phi, you cannot build a machine that will produce two quantum systems both in
that same quantum state phi that would work for all phi. And this is a very -- well, I guess we
could phrase this another way, if we wanted to, that quantum information cannot be replicated
in space. And this is really a crucial feature of quantum mechanical information, because it is a
consequence of this no-cloning theorem that quantum information cannot be measured
without causing disturbance. And that is really at the root of all of the cryptographic
applications of quantum mechanics. And I guess we heard a few of them this morning.
So quantum information cannot be replicated in space. But this is a talk about
spacetime. So let's think about this from a slightly more global perspective. So here's a
spacetime diagram, the horizontal axis is positioned, the vertical axis is time, and on my
diagram -- well, it won't matter phi, but light travels at 45-degree angles, upwards and to the
right and upwards and to the left. So if we have some quantum information that's localized, so
you know, at a point phi, and let's say it's a spin and that spin has moved around a bit as a
function of time, then it traces out a trajectory in spacetime like so. And kind of obviously,
right, the quantum information follows the spin-around. And so phi we had the same quantum
information replicated at many points in time. It's kind of trivial but it's true. This is what
happens. And in fact not only is it possible for quantum information to be replicated in time,
but it's necessary. The quantum mechanical evolution law is unitary. Unitary transformations
are reversible, they don't destroy any information. And so while quantum information cannot
be replicated in space, it has to be replicated in time. You have no choice. And so this first
half of the talk is just going to be about taking that observation. Once we know that, okay,
quantum information can be replicated in this context, and try to figure out exactly what types
of configurations are possible -- Yeah?
>>: I've often been curious, how did this quantum
bit get generated in the beginning?
Patrick Hayden: So that's a good question. There
are different ways of formulating an answer, like
there are different what turn out to be equivalent
ways of answering that question. So one way of
formulating it would be to say that there are some
kind of referee or adversary who prepares the system
and doesn't tell you what state he or she prepares
it in. And then afterwards when all is said and
done is going to perform some test to see whether
you've succeeded at whatever task is defined.
Another way of defining it would be to say that that
system is actually entangled with another system
that we don't talk about, we'll call the reference,
and we check to verify that the entanglement is
preserved.
>>: Beyond the [indiscernible] from a physic's
point of view, [indiscernible] two groups of
physicists are arguing which came first, space or
time, right after the big bang, and each of them
thought the other were dummies for not seeing it
according to their view. But I've often wondered,
where did quantum bits originate in the creation of
our universe?
Patrick Hayden: Oh, my. Okay. Well, that would
take us very far afield. I would be very happy to
discuss it, but I think that for the moment we
should defer trying to answer. I think we could all
discuss that question with passionate opinions.
So I think I told you what the goal is. So let's
just formalize this a little bit more. What am I
going to mean by localizing quantum information in
different regions of spacetime? I first have to
tell you what regions of spacetime we're interested
in. And so phi I have -- well, a pair of regions,
each labeled by a pair of points. The lower point,
say, Y and the upper point is a zed. And I'm going
define something called the causal diamond. And the
causal diamond is the intersection of all the points
in the future of the zed. So this is a spacetime
diagram. Light travels at 45-degree angles, so
everything that could be affected by what happens at
Y is inside this V. And the intersection of that
with the past of the corresponding zed. So
everything that could have affected zed that comes
from the past.
So I have two of these causal diamonds there.
Another way of phrasing this is that this common
diamond, Dj, consists of all of the points that can
be affected by something that occurs at Y and can
also effect the outcome of -- or the state of zed.
Well, I was going to ask you a question. So suppose
we have some quantum information. Now we're going
to ask. Can these two causal diamonds both contain
the same quantum mechanical information? Is that
something that can be done?
And in this case it can be done and it can be done
very trivially. So if we just have the system
prepared at some point in the distant past, in this
state S, then I can just carry S to this point P.
And at that point P is actually inside diamond D0,
so it's been localized to diamond D0, and then we
just redirect it again along this curve, which is
45 degrees, so it's not violating relativistic
causality, and it ends up in diamond D1. So in this
case we have succeeded in replicating the
information of these two diamonds.
And I guess we can also say conversely if these two
diamonds were such that all of the points of diamond
0 were space like separated from all the points in
diamond 1, meaning it was impossible to send a
message, it would violate causality, then it would
be impossible. Because if we had succeeded we would
have violated the no-cloning theorem.
So in this very simple setting with two diamonds in
one space dimension, one time dimension, the story
of information replication is very simple and
uninteresting. That all we have to say is that if
we define a relation, the two diamonds, D0 and D1
are causally related if, and only if, it's possible
in principle to send a message from one to another.
I don't care if it's a message from D0 to D1 or D1
to D0, then I can replicate the information if, and
only if, they're causally related. And that's the
story. And we do it in this trivial way.
But, again, if this were the whole story, this would
be sort of an uninteresting topic. So now you
should stop and think, okay, as connoisseurs of
quantum information, what interesting quantum
information phenomena are you familiar with? Well,
you're familiar with probably teleportation in which
quantum information is somehow split. It doesn't
follow a trajectory, but part of it is carried in
these classical bits and part of it is carried in
entanglement. Probably familiar with quantum error
correction in which somehow quantum information is
delocalized in such a way that we can't learn
anything at all by looking at some -- a few qubits,
we have to look at a large number simultaneously.
So more generally, clearly we're going to have to
have a more sophisticated answer.
And I just want to point out that while I don't want
to get into the details in this talk of exactly what
I mean by replicating the information, I do want
to -- I just want to emphasize that it can be turned
into an operational question, where it's not just
kind of vague notions of replicating information.
If you want to make it operational, you could
imagine your referee stationing agents at these Y's.
And from time to time one of the agents will just
decide, okay, I want to check. You claim the
information is actually present in diamond D0, show
me the money; show me that it's really true. So if
this agent decides to check what's going to happen,
well, he can sort of -- I can take his request and
race into the future, you know, try to access all
points in the future. So I can access all points
that are part of D0 in the future. So I can race to
the point P and grab the particle that was at point
P and redirect it to zed. And so the information
appears in zed, and then I can confirm to you that
the information was in the diamond because I can
actually present it to you. So I can turn this into
a kind of game in which information replication
corresponds to winning at the game and failing to
replicate to the information corresponds to losing
the game.
And just a little bit about this causal diamond
geometry. I drew the base point and the top point
as having the same spatial coordinate, but that kind
of privileges a certain set of coordinates in a way
that really isn't in the spirit of special
relativity. So as long as the upper point is in the
future of the lower point, that gives me a perfectly
good causal diamond. And you should observe that as
the upper point gets closer and closer to being
lightlike separated, so to being on a light ray from
the bottom point, the diamond appears to get thinner
and thinner. And in the examples that I'm going to
show -- I guess the only examples I'm going to show,
just because they're easier to draw, my diamonds are
actually going to look like line segments. But you
should think they're not actually line segments,
they're just very long and very thin. So just keep
that in mind.
So this is the example. My favorite example, I love
it. A few people in the room have seen it. That
really illustrates what you need to understand if
you want to understand how information can be
replicated in spacetime. And almost everything you
need to understand. So I call this the causal
merry-go-round. I have three causal diamonds. So
three regions of spacetime. And let's see how
they're arranged here. My Y's are on the vertices
of an equilateral triangle slightly crushed by
aspect ratio here. And my zeds are on the mid
points of the edges between those vertices. But
they're shifted in time. So you see the vertical
access is time here, and I have two spatial
dimensions. And they're shifted in time just long
enough for a signal to get halfway along an edge.
Right? And so the causal diamonds are in fact these
degenerate line segments that are like light rays.
And in fact the red arrows are also light rays,
because of the symmetry of the situation.
And I would like to replicate the information. You
know, the question is can I replicate the
information. And the structure here, you know, it's
designed to be, I said, this causal merry-go-round.
Because there's a point in the 1 diamond that's in
the future of the 1 diamond, and there's a point in
the 0 diamond that's in the future of the 1 diamond.
There's a point in the 2 diamond that's in the
future of the 0 diamond. Right? But somehow this
relation -- this property's intransitive. There's
no point in the 2 diamond which is in the future of
the 1 diamond. So you can go one way or -- yeah, so
you can't compose this property.
And so if we're going to try to replicate the
information naively, what would we do? We might say
okay, start with this information, which is
localized at S, and let's just carry it to, say, Y2.
Right? And so the information has then entered the
2 diamond. So I put the information in the 2
diamond and then I could maybe carry it along this
light ray to the point zed 1. And then the
information would also be in the 1 diamond. But
then I'm sunk. Because from the 1 diamond there's
just no way to get to the 0 diamond. So the 0
diamond is left zed. So I can get the information
into two of them but not into three. And it seems
like here you're kind of really up a creek; right?
But there is a way to make this work. And the way
to make this work is to encode the quantum
information into a particular quantum error
correcting code. And I apologize for the
nonstandard notation to the people who are into
quantum error correcting codes. What I mean here is
a code consisting of three particles, such that if I
collect any two out of the three I'll be able to
reconstruct the quantum information. So such codes
exist. I know that the quantum architecture people
are quite familiar with quantum error correcting
codes. If you know qubit codes, you may know the
further fact that there is no qubit code with this
property, but there's a qutrit code. And that's
perfectly good for our purposes.
And what we're going to do, once we've encoded the
quantum information that was originally at this
point S into three particles, such that with any two
out of the three we can reconstruct the information,
we'll just send one particle to each of the Y's.
And then from the Y's, we'll have the particles
traverse the red light rays. So that's consistent
with relativity. And you may have missed it, but
now two different particles have passed through each
of the causal diamonds.
So let's just rewind that to make sure that you see
what happened. So at the end point, of course,
every diamond contains one particle. And at the
beginning point every diamond contains one particle.
But they're not the same ones. Because the
particles travel along the red light rays moving
from one diamond to the next. So each diamond
contains two out of the three particles and two is
enough to reconstruct the quantum information. And
voilà, the quantum information is replicated in each
of these causal diamonds. So that's a kind of
simple, nice example. But it obviously doesn't
cover the general case. You could have all kinds of
crazy configurations of causal diamonds in
spacetime.
So let's look at a slightly more crazy
configuration. So here we have four diamonds. And
the picture is -- or the setup is that each of these
diamonds is on one of the faces of the cube. And
again the vertical direction is time. And the
question would be, okay, can we replicate the
information here? And, you know, if you're like me,
presented with this scenario, I kind of threw up my
hand and say I don't know. What could I check? The
one thing that I know is that I shouldn't violate
the no-cloning theorem. So it should be true that
every pair -- if I'm going to succeed in replicating
the information, every pair of these diamonds should
be causally related. There should be some way to
send a message from one to the other. Because if
there is any pair that was actually causally
unrelated, meaning all the points were space like
separated, then to replicate the information in them
would mean that I'd clone the information. That the
same information was at two space like separated
points. And that's the first thing to check.
Well, it turns out that every pair of these diamonds
is causally related. There's always a way, for each
one of these -- for example, from the 0 diamond, I
can send a message to the 1 diamond, maybe from the
3 diamond, that Y3 point, I can send a message to
the 0 diamond and so on. So every pair is causally
related. And so the obvious thing to check, the
violation of the no-cloning theorem doesn't rule out
this picture.
So what comes next? What's the next most
complicated constraint? I couldn't come up with
any. Neither could my student, Alex May. And
eventually we proved a theorem that in fact the
no-cloning theorem is the only constraint on the
replication of information in space and time. So if
you have any configuration of causal diamonds, they
can contain the same quantum information if, and
only if, every pair is causally related. And so -and the equivalent way of formulating this is to say
that they can -- each of these diamonds can each
contain the same quantum information if, and only
if, there is no obvious violation of the no-cloning
theorem. And it didn't have to be this way.
Yeah?
>>: Is it true that this [indiscernible]?
Patrick Hayden: Like a trivial consequence of
Lawrence and ->>: No, is it trivial that this definition or
construction is [indiscernible]? Because like you
want --
Patrick Hayden: Yes, well, I don't know if it's
trivial. But the reason I talk about causal
diamonds is because in a relativistic theory, you
can attach a density operator to the causal diamond.
And basically you can foliate the diamond by space
like hypersurfaces and there's a unitary time
evolution from one to the next. So there's
effectively just a single density operator for the
diamond. And so the question of whether the
information is there is whether there's a fixed
unitary transformation that will perform the
decoding. So yeah, it's well defined.
Okay. So the answer turns out is very simple. And
I don't know really what message to take from this,
except that it didn't need to be so pretty. And I
take that to be some kind of indication that there's
some beautiful compatibility here between quantum
information and, I guess, relativistic causal
structure.
Just as an aside, most of you will not be familiar
with this, but there has been a controversy raging
in the quantum gravity string theory community over
the past couple of years -- well, actually over the
past 40 years, but it flared up in the past couple
of years, about the fate of information in black
holes. And people, myself included, have gotten
themselves completely wrapped up in knots and
confused in thought experiments that actually
involve the cloning of information in spacetime.
And black holes have this annoying property of
seeming to become cloning machines. And so because
we got so confused on those lines, that was one -really the motivation for this work to try to
understand the replication of spacetime in a
situation that's much less exotic. Just Minkowski
flat spacetime. But even there, there are surprises
and much more interesting structure than you might
have thought.
So this conclusion for Part 1 is that there's a
surprising variety of different ways to replicate
quantum information in spacetime. This discussion,
it contains a special case as the theory of quantum
secret sharing that some of you may be familiar
with. It interfaces with theory quantum error
correcting codes. And of course I would love to
convince some quantum optics people to build this
thing. And we're working on that.
Yeah?
>>: So we know there are various types of error
correction codes, and they have different
[indiscernible]. I wonder what those theorists say
about your problem of [indiscernible].
Patrick Hayden: Well, if I were going to give an
entire talk about the subject, what I would move -I would explain to you how to do the general case.
And to solve the general case, it turns out that the
natural way to do it is that you need a quantum
error correcting code with some very interesting
properties. So you're going to be correcting for
losses, but you're actually going to lose -- they're
going to be N squared shares to the code, and you're
going lose all about N of them. So you're going to
recover from -- you're going to recover from losses,
where you lose almost all your qubits.
And in the usual quantum error correcting codes that
people talk about, of course that would be
impossible, because it would mean that you would
clone. So the different qubits in the code are not
treated on the same footing. There's some internal
structure. So we've been playing a lot with these
codes and I'd love to discuss them with you.
And right now we're actually trying to -- or we've
made continuous variable versions of these codes,
because we want to convince quantum optics
experimentalists to do this. And it's actually
easier to do it as a continuous variable code than
as a qubit or qutrit code.
But, yeah, if you just use the usual parameters, you
would conclude that this is kind of impossible. Or
you would do it using a recursive construction in
which the number of qubits per diamond ends up
being -- well, if they're N diamonds, something like
N factorial. So it would be just absolutely
impossible. But with good codes you can make it
reasonably efficient.
So that was Part 1. There was no gravity in Part 1,
it was just the causal structure of special
relativity and how it interfaces in interesting ways
with quantum information.
So Part 2 is about the connection between
information theory, quantum information theory, and
holographic spacetime. And I promised you some kind
of quantum information theoretic interpretation of
the length that can occur, whatever that means.
So what is this holographic principle? The idea
proposed by Susskind and 't Hooft back in the 90s
is that all information in a region of space can be
represented as a hologram living on the region's
bounding surface. And that's kind of some words,
you know, what would that mean? Well, at some level
it sounds crazy, right, but if you have a solid with
a bunch of particles arranged like this, and each
particle had some number of states, say like K
states per particle, and clearly the total number of
states we have available here is going to grow
exponentially with the volume. And so the entropy
of this thing is going to be proportional to the
volume.
And, well, the statement of the holographic
principle is that this is actually wrong, that in a
volume of space in fact the number of states is only
going to grow exponentially with the area. And that
sounds crazy, but it's not. And the reason is that
the most entropy dense object that can exist in the
universe is a black hole. And the entropy of a
black hole is proportional not to its volume, but to
its area. So if you try to cram a bunch of
information into a volume of space, eventually the
thing is going to collapse to a black hole. And the
number of bits that you've stored there is not going
to be proportional to the volume of that region of
space, but to the area. Because it's going to be
the entropy of the black hole. And of course ->>: The area of the boundary?
Patrick Hayden: The area of the boundary, yeah.
And so I guess a way of thinking about this is that
somehow the universe is not built out of Lego
bricks, the universe is built out of shadows. If
you think about Plato's cave where you try to infer
what's happening, infer reality by only seeing the
shadows playing on the wall, in fact all there is is
shadows. But that's kind of mumbo jumbo.
We can make this -- or Juan Maldacena back in '97
provided us with a concrete realization of this
holographic principle. And we don't need to know
any detail about this really, but the idea is that
the physics of -- the quantum gravity physics of a
d+2 dimensional, so I have one time dimension, and
d+1 space dimensions, this bulk, right, is
equivalent to some physics without gravity living on
the boundary of that spacetime. So the physics
without gravity turns out it's a quantum field
theory with a special symmetry, conformal symmetry;
they call it a CFT, conformal field theory.
But this actually realizes the holographic
principle. Because all of the physics of this bulk,
this d+2 dimensional thing, is completely equivalent
to physics of the boundary, which has one less
spatial dimension. And so -- and there's a
dictionary that allows you to go back and forth
between questions about the boundary theory and
questions about the bulk. And it sounds very kind
of abstract. You can just think of the boundary
theory as being some material. I know Dave Wecker
yesterday was talking about simulating the Hubbard
Model. You can think of it being some slightly more
exotic version, something like the Hubbard model at
a near phase transition. And that's what this
material at the boundary looks like.
And it turns out that if you want to solve problems
about this material, in many cases you can translate
it into a quantum gravity question. And maybe the
hard problems about your material become relatively
easy general relativity problems. They'll look
completely different. So this is what's called the
AdS/CFT correspondence.
Yes?
>>: I'd like to offer a quick self experiment. So
there is a [indiscernible], there is a surface. And
this basically says that the surface describes
everything.
Patrick Hayden: The surface is ->>: Describe all the information.
Patrick Hayden: Yes, okay.
>>: But we can do something like X-ray thermography
on this volume, and quickly discover that there are
some theory structures that are really there, that
are important. So how would you actually concile
the view of --
Patrick Hayden: So the question is about counting
the number of degrees of freedom. So you wouldn't
deny that the bulk exists. We live in the bulk ->>: No, I'm not so sure.
Patrick Hayden: Yeah. But the point would be, what
you thought were independent degrees of freedom,
right, that if you tried to align atoms in some
rectangular prism, like fill up some volume with
atoms, you would have thought that your atoms were
independent degrees of freedom. The issue is that
when you pack them densely enough, gravity causes -gravity becomes important. And ultimately they
collapse to black holes. So if you write down the
big list of all possible states of the system, it's
much smaller than you thought it was. So that's
really the point. It's not that the bulk doesn't
exist. Or maybe some people would argue it doesn't,
but ->>: Is it similar to a unique solution for
[indiscernible] problems?
Patrick Hayden: In some ways, yes.
Okay, so just a final point about this AdS/CFT
correspondent, if we look at one slice of time, then
the -- in order to make this precise and make it
work, the geometry of what this time slice looks
like is that it's a negatively curved space of
maximal symmetry. And so this is hyperbolic space.
And if you want to measure distances between points,
this is Escher's -- one of Escher's renditions of
hyperbolic space, what you do if you want to measure
the distance between this point and that point is
you think about all the different possible ways to
get between them and you find the one that contains
the fewest fish. So it's the fish-counting metric.
And you can see the fish are big in the middle and
they get smaller and smaller out toward the
boundary. In fact the fish become infinitely small
out toward the boundary. So that's how you measure
distances in this world.
Okay, I might actually have time to do this. We're
going to talk about -- this is going to be about
information theory. So what kind of natural
information theoretic quantities might be lying
around? Well, entropy. The information theoretic
quantity par excellence. And so we're going to
think about being in just the non-exotic boundary
theory, this material, and we want to calculate the
entropy if we just look at some part of it, how much
uncertainty do we have about the state.
So, again, I'm not sure about the background of the
audience, but if you have a quantum mechanical state
and you have some uncertainty about that quantum
mechanical state, then the correct way to describe
it is with something called the density operator.
And the density operator is hermicion, it has Eigen
values, and the Eigen values are non-negative in sum
to 1. So you can think of them as a probability
distribution. And the entropy of that region is
nothing but the Shannon entropy of that probability
distribution. So from Shannon entropy, I imagine
all the computer scientists are familiar with, from
Huffman coding and whatnot. And so that's what
we're actually talking about. And it's a measure of
the uncertainty of that region or its entanglement
with the rest of the world.
So here we have an interval of the boundary theory,
which I'm going to call A, and we want to figure out
what's its entropy. Now, a number of people here
have worked on simulation or condensed matter
theory. And this is a well-defined calculation.
There's a great big enormous operator. You want to
diagonalize it, find its Eigen values, evaluate this
function. And sometimes people can do it in heroic
tour du force calculations. But we're going to see
what the dictionary of AdS/CFT tells us is the easy
way to calculate it.
And in this holographic dictionary, the answer is
that this entropy is a constant, about 1 over 4
times Newton's constant times the minimal area of
some object. And because here -- so my objects are
these curves, these gamma As that start and end on
my intervals, my interval A on the boundary. So my
interval A on the boundary has two end points. So I
think about curves that penetrate into the bulk and
terminate on those end points. And there are a
whole bunch of such curves. I just find the one
that has minimal length, which is what is area in
this context. And this entropy is nothing other
than the length of the minimal curve -- well, the
minimal length among all curves. And that's,
generally speaking, a much easier calculation.
Instead of trying to diagonalize some matrix, which
is probably 10 to the 30 by 10 to the 30, all you
have to do is a little bit of simple geometry. You
know, calculate, write down the geodesic equation, I
solve the geodesic equation, and I evaluate the
length.
And so this is generally much, much easier, and it
gives you the right answer. And I mention -- well,
I don't know if I did mention. I did mention that
the entropy of a black hole is proportional to its
area. And this formula generalizes this fact. And
it generalizes it in the following way. If instead
of talking about an interval in my boundary theory,
I just take the whole boundary, right, then you have
to ask, well what are the curves that I'm allowed to
minimize over? Well, it's all of the curves that
sort of wrap around in the interior. And the one in
the minimal length is the one that gets hung up on
the black hole itself, so its length is nothing
other than the length of the black hole in this
case, or black hole horizon, which is its area in
the appropriate sense.
So this was proposed by Reu Takionaki back in 2006,
I think, and it's passed many tests. And I can't
say it's proved but it's understood why this should
be true.
Okay.
>>: What happens to string theory? This really
needs string to reconcile -Patrick Hayden: Oh, excellent question. So I'm
not -- I'm actually not an expert on string theory
at all, but so Maldacena arrived at this proposal
using string theory. So he looked at the same
string theoretic system in two different limits and
he got two very different looking descriptions, but
because they both came from the same underlying
string theory, his conclusion was that they should
be the same thing.
>>: That's one way to prove theories.
Patrick Hayden: Yeah. Exactly.
But this has actually become an industry. People
use AdS/CFT, this correspondence, without knowing
any string theory. Like it's become quite a popular
thing to think about applications of AdS/CFT in
condensed matter. Like trying to understand exotic
condensed materials using this correspondence, that
may be thinking -- trying to understand your
material and in this dual gravitational picture is
much simpler. I think Ragu has actually done some
work like this.
>>: So the argument is to be raised, that the
presence of matter causes gravity by virtue of the
entropy that's possible because of the large
configuration that makes it possible.
Patrick Hayden: Can you repeat that?
>>: The reason gravity is associated with presence
of matter is because matter is the way to build a
high entropy via configuration.
Patrick Hayden: Okay. Yes. Provisionally.
All right. So now, entropy, you know, as a starting
point, is already kind of an information theoretic
quantity. But again there are a lot of computer
scientists in the room. And you probably have a gut
feeling for what entropy is. It gives you the
optimal compression rate for a sequence, that the
entropy of some source tells you the rate at which
you can compress samples that are taken from that
source, say, independently.
And the same thing is true in quantum mechanics. If
you know the entropy of some source, that tells you
the minimal number of qubits per copy, per sample,
that are required to compress this thing without
distortion. And so we can say then we have this -an information theoretic interpretation of this
geodesic curve. It's the minimal number of qubits
required to compress the information in this
boundary interval and send it somewhere else. If
somebody held this material, you know, in principle,
and wanted to take a part of it and send it to a
friend, you can ask how many qubits would be
required. And it's going to be governed by this
entropy quantity.
But of course we had other curves out here. We had
these other curves that weren't the geodesics, that
weren't the minimal ones. And we know that their
lengths are generally going to be longer, just like
non-optimal compression protocols are going to use
more qubits. So the question would be, maybe
there's a correspondence between these non-geodesic
curves and non-optimal communication protocols. Of
course the answer is going to be yes.
I should just say another way of thinking about
compression in a quantum mechanical context is
instead of counting the number of qubits that need
to be sent, you could say, well, if Alice is going
try to teleport this A part to Bob, how many bell
pairs would she need? And it's, again, just
governed by the entropy, because she expresses and
then she teleports.
Okay.
Krysta Svore: One minute.
Patrick Hayden: Oh, really? Okay. Let's see.
Well, I guess I'll just say that -- I'm going to
have to go quickly here. A general curve, what you
can do if you actually want to calculate the length
of a general curve, and this was done by these
authors here, is that you could try to approximate
it by segments that were geodesics that terminate on
the boundary. Because geodesics that terminate on
the boundary, we understand what those mean. And
this is roughly how do you it. You have a bunch of
intervals, you subtract off of some overlaps. And
in the limit where the shift between the intervals
becomes infinitesimal, you actually reproduce the
length of the curve. And you get a formula for the
length of a general curve in terms of things that we
kind of know already. And but the formula looks
like an entropy of something minus an entropy of a
part of that something. And again, those of you who
know information theory, what is the entropy of
something minus the entropy of a part of something?
It's a conditional entropy. Right?
And conditional entropy, despite its defect, you
know, its quirks. In quantum mechanics it can be
negative, so a negative uncertainty is a bit of a
strange thing. It needs to be more than certain, as
was observed by these scientists. It nonetheless
actually has an operational interpretation, that if
you want to ask the question how hard is it to
teleport a bunch of As, you know, As are parts of
some quantum systems, to Bob, and Bob already has
part of that system, the Bs, then the cost, in terms
of bell pairs, is the conditional entropy. And that
cost can be either positive or negative. If the
cost is positive, of course it means that you
actually have to use up some bell pairs. If the
cost is negative -- you know, we're at Microsoft
where you're familiar with negative costs. Those
are called profits. Then you actually earn some
entanglement out of this process that you could use
in the future.
So just fast forwarding a little bit here.
But the basic idea is that the length of a general
curve in spacetime ends up being the cost for a
process where Alice is trying to teleport
information corresponding bits of the curve to Bob,
but Alice and Bob are constrained in geometrically
in where they can act. So like here Alice and Bob
at any given time they have to act in one interval.
Bob already has part of it in the communication, the
cost is going to be a conditional entropy, and then
they move along to the next interval. And when you
add all of these up then you actually recover the
formula for the length of the curve.
So I apologize for having raced ahead a little bit
there, but that is the story, that the length of at
least a convex curve is the minimal entanglement
cost for Alice to transfer the boundary state to Bob
when Bob is restricted to act locally in some
intervals that are determined by the geometry of the
bell curve. And that's the information theoretic
interpretation of a curve in space.
So the conclusion is that bulk geometry and boundary
entanglement are intimately connected. And I think
that nontrivial results from quantum information
theory, like this state merging that I had to blow
through a little bit from 2005, can teach us things
about the geometry of spacetime that would not have
been accessible and understandable without quantum
information theory.
There is a bit of a movement right now to try to
understand, try to see whether we can see the
emergence of spacetime from the structure of
entanglement and whether -- we don't know the extent
to which that will be possible but these are some
tantalizing hints that something along those lines
should be true.
So thanks for your attention.
[Applause]
Yes?
>>: Very elegant. I thought talking radiation
solved the problem of conservation of information.
Are you saying there's some question about that now?
Patrick Hayden: Yes. So I didn't talk much about
the information paradox, but this correspondence,
like the Maldacena's AdS/CFT, that, back in the late
90s, caused people to essentially think that the
information paradox had been solved. The reason
being that you could -- in this setup you could
describe something that looked like black holes, and
that black holes had some evaporation process. But
they have this dual picture, this dual picture where
there's no gravity and it's just a standard quantum
mechanics. And the standard quantum mechanics, we
know is unitary, doesn't destroy information. So
the reasoning was that black hole evaporation should
be unitary. Now, I think most people still believe
that to be true. But understanding in detail how
information comes out and how the -- how our sort of
semi-classical understanding of physics can be
consistent with the unitary black hole evaporation
is very much -- I think pretty much everyone who has
thought about this is just confused at the moment.
I mean confused to the point where there's actually
disagreement as to what happens, not when you hit
the singularity of a black hole. But if you're
falling into a large black hole, then as you cross
the horizon, that can be a region of arbitrarily
small curvature, so you shouldn't expect anything
special to happen as you cross the horizon. So we
think that if you're free falling across the horizon
nothing special should happen. But there is
significant minority of people now who would claim
that when you hit the horizon, that's the end of
spacetime or you burn up or something horrible
happens, like a horrific violation of generally
relativity. And this is because they can't figure
out how the entanglement stitches everything
together properly without violating cloning.
Krysta Svore: So one more question while Mario
sets up.
>>: So we learn that when you mention is
[indiscernible] quantity and [indiscernible] -Patrick Hayden: Relative entropy?
>>: Just [indiscernible].
Patrick Hayden: Okay. Yeah. Oh, yes.
>>: And there are entropy notions that are the sum
of [indiscernible].
Patrick Hayden: Yes, so I totally glossed over that
story. And in order for this correspondence to work
really nicely and to get general relativity and sort
of smooth geometry in the bulk, you need a lot of
degrees of freedom per site in the boundary. And
that kind of, I think, can replace the many copies.
There's some kind of thermodynamic limit happening
in the boundary already. So I think, although we
haven't proven this yet, that the one-shot entropies
will actually coincide with the Lonoy entropy. But
otherwise, there always have to be caveats in what I
said as far as many copies, blah, blah, blah.
Thanks for the question.
Krysta Svore: So now we're going to hear from
Mario Szegedy on what condensed matter physicists
would rather not talk about. It will maybe change
the three negative results.
Mario Szegedy: Well, just to be brief. So probably
all of you, or some of you know this Barbie Doll
that said that math class is tough. And actually
Microsoft redesigned it and this
technically-advanced Barbie Doll now says computing
mean values and ground states of local interaction
systems is tough. Actually this Barbie Doll has
become an instant success with toddlers. And I
explain you why. Because actually what she says is
true, that computing mean values and ground states
are actually harder than some think.
So I want to just talk briefly about theories and
results. And my talk is going to be just the
opposite of Patrick's talk, because I am just
talking about mathematics and not intuition. It's
like a Hungarian hobby to get deeply involved into
proofs without stating the theorem. And probably is
not going to be as interesting or exciting.
So the first -- so these three parts are completely
disjoined, so if you don't like the first or you
don't like any parts you can go to the coffee and
come back for the next part.
So the first is actually it's not my result, but
it's a result of Alistair Sinclair and Piyush
Srivastava. It says the mean magnetization in the
ferromagnetic ising model is actually #P hard to
compute. Now, what my result is that I started to
work -- actually it's very exciting research now
with Srivastava, and this is just a complete
bi-product that we just observed that some of our
thoughts actually imply the first result in a very
simple way. And so I don't even tell any of the
definition except that this ZI is the partition
function of the ferromagnetic ising model, and the
average magnetization is expressed by taking the
derivative of this polynomial, taking the derivative
of this polynomial and just creating this
expression.
So the statement is that computing this is #P hard.
So don't worry about #P. #P means it's harder than
NP, it's harder than quantum computing. So it's
just really hard.
So actually there is a multi-variant version of this
partition function. And the proof uses in -- maybe
in an intriguing way, I mean both -- I almost did
this. So both the Alistair Srivastava and the proof
uses this multi-variant version.
So you don't need to know even what this polynomial
is for the proof, just a couple of facts. So the
first fact is that ZI -- so the partition function
itself is #P hard to compute. And actually the only
reason -- the only thing that stops you from proving
that sort of this algorithmic derivative is #P hard
because there could be a common deviser of the ZI
derivative. So when we look at this formula, it
somehow simplifies.
So the first thing to understand -- so actually the
only thing to prove -- to be proven for this theorem
is that this does not happen. So what I want to
prove is that this, let's say, mystery polynomial
and this derivative don't have a common root.
Now here comes a famous theorem. And if I don't say
anything about my research, well, that's actually
the thing. Or if I did not say, then this is a fact
that is verified to know, which is the Lee-Yang
circle theorem.
>>: Sorry, [indiscernible].
Mario Szegedy: Well, it's N particles.
>>: Okay.
Mario Szegedy: Ising model N particles. It's a
finite model, so #P hardness of course classifies it
in terms of N. So the Lee-Yang -- and by the way,
this N is the number of particles. So the Lee-Yang
theorem says to take the partition function of the
ferromagnetic ising model and simply [indiscernible]
so it can be 0 only in the following ways. That
either, like all XI's on the complex unit circle, so
this is a complex thing, or while some are under the
complex unit circle and inside and some are outside.
But it cannot be that all are inside. So that's a
very famous fact. And this helps proving that the
ferromagnetic ising model does not have trivial
phase transition.
Now, so that was an important fact, so we need yet
another fact. Is that if I am taking this
polynomial and the partial derivative at the X1,
then on the unit circle -- and actually here I could
even take different lambda Is on the unit circle, so
this partial derivative is not 0. And there is yet
another fact, this is the last one, that actually
this mystery polynomial is multi-affine. So what it
means is that every XI occurs, so if I am looking at
it only as a polynomial of XI, then it's linear and
each XI separately. But of course it's not linear
or together, so it's called multi-affine.
So now let me at least state the theorem so that you
know so that the theorem is as I said. What I want
to prove is that the ZI and this derivative don't
have a common root. So there is no such lambda I
such as the roots of both, so that's what we are
going to prove. And so the proof is strikingly
simple is the remaining of this space and the next
page. So assume -- and I am just using those facts.
So assume that actually there is a common root of ZI
and this derivative and actually every root of the
ZI -- oh, so I am using ZI, so overloading the
notation, I am using it as a multi-variant
polynomial like that and I am also using it as a
single variant polynomial like for each XI I just
replaced lambda. So here of course we are talking
about a single variant polynomials. But so what the
Lee-Yang theorem says about this single variant
polynomial, verify the place, lambda, lambda, lambda
everywhere, and the lambda were larger than 1 or
smaller than 1, then we would be either outside the
unit circle everywhere or inside the unit circle
everywhere, and so this configuration is forbidden
for the roots of this multi-variant polynomial.
So how do I get the contradiction? So I want to get
a contradiction from this assumption. So I'm going
to just -- so let's assume that there is such a
lambda 0 and so I am looking -- I am replacing
lambda 0 into ZI everywhere, and now I am perturbing
these values a little bit. And then I am perturbing
it in two ways, and it's important in two ways. So
first I am just attaching a multiplier 1 minus
epsilon everywhere, and the other way is that I am
also adding a little perturbation to the first
coefficient, and I claim that I can, from these two
perturbation, I can mix a 0. So by fixing epsilon
and Delta appropriately, I can create a 0 in an
illegal place. So if Delta has smaller absolute
value than epsilon, then so the first perturbation
with epsilon is that I'm going inside with epsilon,
and here I am adding a little doubt about smaller
than epsilon, so I am always inside the circle. And
thus my Lee-Yang theorem is impossible. So that's
what I want to show, that I can create this
situation with some epsilon and delta.
So how do I choose that epsilon and delta, that's
the only thing I have to tell you. And here is
the -- actually the end of the proof with the
[indiscernible] sign. And let me just show you the
picture. So, again, I perturbed this way everything
and then here I moved a little bit away. Now, what
was our assumption? Well, it was that lambda 0 was
the root of the polynomial itself and also of the
derivative. So that was the common lambda 0 of the
root of both. So therefore, itself and the
derivative itself disappears, so if I am now
looking -- so I'm making the first perturbation,
then I'm getting at the point 1 minus epsilon,
lambda 0, one minus epsilon minus zero, one minus
epsilon lambda 0, one minus epsilon lambda 0, the
polynomial, the multi-variant polynomial takes
value, which is secondarily small in epsilon.
So now I am using the fact that the partial
derivative that was one of the facts, fact 4, that
the partial derivatives did not appear -- does not
disappear at X1. So I can now whatever small
quantity -- I mean whatever small value the ZI takes
at this point, I can compensate it with delta,
because not because of the multi-affine property,
the function looks like this, that when I am -- so
when I am on the point here and then here it's
almost the same as when I am at the point, well, at
this junction, there, there, there, except that
there is this additional term. But because this is
not 0 and this is very small I can set delta sole
that is going to be smaller than epsilon, and so
this whole thing disappears, as a matter of fact
this is how I should set it.
And so that was actually a 30 pages paper by
Alistair and Piyush, so actually what I should
appreciate is exactly the simplicity of this proof
and nothing else. But our research hopefully yields
some fruits very soon under -- related to the
Lee-Yang theorem.
So now it's just completely different thing as Part
2. And so let me switch gears here. Okay, so here
is the hope. So now this was actually classical
although I did not say that. But the -- this was
the classical ferromagnetic ising. So now I am
talking about quantum. And I am talking about the
area law, and Patrick has already talked about it,
and some of it I might very briefly repeat.
But so the hope is that why does the area law matter
in condense matter physics? Because the hope is
that if the area law holds, like if I have material
and I am looking at the ground state of the
Hamiltonian created by the interactions between the
particles, then we hope that the area law holds for
the ground state. And so we can describe the ground
state by cutting the material in two pieces. And
since there is small interaction between the pieces,
that's what aerial means, we can put together the
information and get the entire information on the
ground state. So that's the hope.
And so our result is that actually the area law does
not hold at least the general area law, at least the
very general area law for graphs does not hold. So
Patrick has talked about it and I just repeat
that -- so the area law requires a notion of
entanglement entropy. And it's good to have as many
measures about entanglement as we can, because
that's what we study in quantum. And the first
measure is of course if something is entangled or
not. So if it stands [indiscernible] then it's not
entangled and so if you have a bipartite system, so
in the index means or in the label means that it is
the first system and 2 is in the second system, and
if it looks like this then it is entangled.
So now this grandiose measure, this entanglement
entropy, which is like we think is the most elegant
measure for entanglement is defined as -- so if we
trace out, so we have the two parts of the system,
and we can either trace out the first part or we can
trace out the second, and you choose, which one, but
whichever part you choose to trace out, the
entanglement entropy is defined as -- so you get a
density matrix simply just defining the entropy of
the density matrix. So if you do the other you just
get the same.
And here like an example -- so basically -- so since
Patrick explained it better, but I just wanted to
explain it this way that so when you trace out Bob
and you have like a state [indiscernible], what do
you get? Well, you get the probability distribution
on states. So what is the amount of the information
to describe Alice's state in -- well, you know,
Alice, after the measurement, so that's the other
explanation for mixed states. The first is with
density matrix, but there is another explanation of
mixed states is just the probability distribution on
states. So if I want to tell you which state, so if
all but you know is that it's either this state with
probability run over [indiscernible] and so how much
information do I have to tell you? Well, it's log
of N bits of information. So here could be a bit of
confusion between E based log and 2 based log
depending on whether you talk with a physicist or
with a computer scientist, but I don't want to be a
judge here. So actually in one slide I had one and
in the other I had the other, probably just to
confuse anyone.
So this is another way to explain entanglement
entropy, that if like we factor out Bob, what is the
information I have to tell you about if I want to
specify Alice's state.
So what actually gives hope to the aerial, and here
I am actually -- I am telling you what the area law
is in this general area law, which is actually a
true statement, and this is mathematically proven,
the catch is that only for commuting terms. So if
these local Hamiltonians that describe the
interaction, they commute with each other, and they
always commute. But when they are neighbors we have
to say they are commuting. So if they are
commuting, if they are local, so we normalize them
appropriately, then -- and so if the area law says
it's such Hamiltonian, then if it is gapped, meaning
that the ground state that the energy difference
between the energy of the ground state and the
[indiscernible] state is some constant delta, then
the entanglement entropy between two parts, like
here the black part and the white part, is upper
bounded by the size of the cut. And the cut is just
a number of edges we cut. We have to cut between
the two parts. So here is three. So it's upper
bounded here is three times whatever the Eigen value
so that's what the area law says. And it's true and
in the commuting case it is just simply true.
And so our result is that in the known commuting
case for general graphs it's not true. And so we
know actually that for the general non-commuting
case, to match you that is true, and then there is
some [indiscernible] improvements, and so our result
says that the general is not true -- I mean the
non-commuting is not true and even, so the graph
that we found is actually is just one edge, so the
area law would say that -- and it's Hamiltonian
[indiscernible], so the area law would say that the
entanglement entropy is upper bounded by 1 or
constant, and it's simply not true. So that's the
second negative result.
And I have some time for the third, I hope. and I
will be very quick. Minus two minutes.
Krysta Svore: Fifteen minutes.
Mario Szegedy: Okay. That's great. So the next we
have more time. So I have three more slides.
So when I go to, let's say [indiscernible] talk,
then I see the following like rosy picture about
condense matter physics. Okay. So we cannot
compute the ground states for every Hamiltonian we
want, but at least we can compute ground states for
some small subset of like matter. So the small
subsets. So at least if we have like some
interaction gravity -- I mean so the local and
grid-like and the Hamiltonian is gapped and the
interactions are translation variance; it has all
these good properties, then we can say something
about the ground state.
And so now the third negative result says that
actually we cannot. So even if you want to compute
some mean value, then that's already hard to say.
That's already hard to tell.
So enter Kolakoski sequence. So Kolakoski sequence
is a fun sequence. And when I worked with
[indiscernible], then everyone was looking at the
Kolakoski sequence, starting from Jeff
[indiscernible] and others, of course. And just
those who were crazy enough to work on this little
stupid thing. So we have a sequence, 1, 2, 2, 1, 1,
2, 1, 2, 2, 1, 2, 2.
So what is this sequence? So look at the runs and
the lengths of the runs. So the lengths of this run
is 1, the length is 2. Two, 1, 1, 2, 1. So what do
you see? Well, you see that actually you get the
same sequence. So actually this is a unique
sequence. Well, if you start with 1, it's a unique
sequence with this property. That is the same. And
if you start with 2, then there is another unique
sequence. But let's assume that you start with 1.
So since 1965 it is unknown if the number of 1s in
this sequence is roughly equal to the number of 2s.
So we create -- so we have a very simple rule and we
just simply don't know if the number of 1s is as N
goes to infinity then the fraction of the entries
that are equal to 1 is 10 to 50 percent.
So how does it relate to condense matter? Well, we
can create sort of a crystal out of this Kolakoski
thing. And that's really my last slide. Although I
should have continued with some other examples,
because this crystal has one problem, which is that
this points, it's not completely translation in
variant, because it's just defined on this quarter
and this point is different. So if I wanted to go
completely translation in variant, I would have to
look at [indiscernible] for instance, and things
like that. But this is just an ongoing research
anyways. So I don't know how far I get. And these
are not huge things. But notice that if I am
writing down the Kolakoski sequence vertically or
horizontally and also vertically, then -- and if I
am putting the two things together, then I can
create local rules that are local translation in
variant rules, that was unique solution is exactly
the Kolakoski sequence.
So in this material, actually, let's say 1 -- if the
magnetization is -- I mean 1 is spin down and 2 is
spin up, then in this material, the 1s and 2s equal
out exactly if in the Kolakoski sequence that
50 percent holds. So it's a very simple rule, and
yet no one knows. So it shows that even such simple
problems are hard.
So sorry for being so negative. These are my -this is my negative talk. Thank you very much.
[Applause]
>>: I have one question. [Inaudible] that you
mention is computing such a number. And it's
basically saying it's computing exact values
[indiscernible].
Mario Szegedy: Very good question. Yeah. Very
good question. It can be done. So already
positive, more positive than I. It can be
approximated.
>>: [Indiscernible] like is it constant? Or -Mario Szegedy: I think within arbitrary -- I mean
within epsilon -- 1 plus factor 1 plus epsilon and
you choose epsilon.
>>: But it's polynomial in epsilon.
Mario Szegedy: That I am not sure when the epsilon
gets into the exponent or in the running time or
it's a factor in the running time. I mean
[indiscernible] or something like that.
Any other questions?
[Applause]
Krysta Svore: Okay, so now we're going to hear
about some quantum algorithms. Matt Hastings is
going to talk to us about quantum chemistry by
quantum simulation and talk about the recent results
in the algorithms and estimates for the runtime.
Matt Hastings: Thanks. So this is a work that was
with a number of people, some of whom are here, Dave
Wecker, and Nathan Wiebe, and Matisse Troyer, as
well as some other people who are not here right
now.
And if I've given this talk like half a year ago it
would have been quite a pessimistic talk, but now
it's a fairly optimistic talk. The question is if
we want to simulate quantum chemistry on a quantum
computer -- and this has always sort of brought
forth like an application of quantum computers -one of the first things you might do, one of Simon's
original reasons for being interested, was
simulating quantum chemistry, how hard is this? And
it's sort of not so hard to see that you did do it
or simulate any quantum system, at least if you're
talking about simulating time dynamics of the
quantum system in polynomial time on a quantum
computer. But really the question is what is this
polynomial and just how long is it going to take?
And we've had a large number of improvements, some
of which are algorithmic improvements where it's
really sort of a computer science issue where we're
doing exactly the same mathematical computation but
we're just doing it with a faster algorithm.
Some of them are physics inspired, relating to
reordering some of the computation to reduce the
error. And some of them are more sort of error
bounds improving or understanding of the error. And
they've all led up to a really large change in how
fast we think this problem can be simulated.
So why is quantum chemistry worth simulating? The
reason quantum chemistry is worth simulating is you
can do interesting problems with a small number of
logical qubits. And of course the catch is I'm
talking about logical qubits, and this might require
an enormous number of physical qubits. But you can
already do interesting relevant problems with the
order of a hundred logical qubits with the number of
qubits being required going up as the size of the
molecule increases or as the base in size increases.
So what is it that we want to do? We want to
estimate -- for some molecule we want to estimate
the ground state energy, we want to estimate some
observables in the molecule, like where the
electrons are, they're there to get other
properties, like polarizability.
And how do we do this? Well, what we have is we
have Schrodinger equation describing electrons
moving in the continuum. It's written down -- it
was originally written down by Schrodinger. There's
this grad squared sitting in there, so the wave
function of the particle in some position in space,
what you have to do is you have to make it into a
finite problem. So instead of having the particle
being some continuum position in free space, you
truncate to finite basis set. And this allows you
to represent the problem on a computer. This would
either be done on a quantum computer or is currently
being done on classical computers. There is
standard basis sets that exist. There is a large
literature on what good ones are, and larger basis
sets will give you a higher level of accuracy.
So you truncate to some finite problem and the
problem then is to estimate the energy for this
basis set. The basis set gives you a set of
orbitals in real space, and each of these orbitals
can be filled by an up electron or a down electron.
So hence, this is [indiscernible] two electrons can
fill each orbital.
This basis set, typical basis sets come from
combinations of Gaussians. And I think I just want
to emphasize is that there are classical packages.
PyQuante mentioned there is an Open Source package,
Gaussian is another one, sigh 4, and there are many
classical packages that you can get, some Open
Source, some commercially available, that have a
well-developed theory for generating these basis
sets and then generating the needed interaction
terms in this basis set.
What do I mean the interaction terms in this basis
set? Well, eventually what it will spit out is it
will spit out a Hamiltonian for the problem in the
following form. You have P and Q labeling -- I'll
use the term spin orbital to refer to both, an
orbital degree of freedom and a spin degree of
freedom. So it will label -- spin up or down as
well as some particular basis function. So there
will be two types of terms. There's terms A dagger,
P, A, Q, with some co-efficients H, P, Q, so you can
get the diagonal with some energy term, the off
diagonal one with some hopping term. And then some
terms H, P, Q, R, S, A dagger, P dagger, Q, A, R, A,
S, which is an interaction term. And this comes
from the qulong potential.
So one thing I should emphasize maybe for the
physicists in the audience especially, usually in
physics we're used to seeing these interaction terms
looking diagonal, like A dagger, P, A, P, A dagger,
Q, A, Q. And such terms do exist in our problem,
but really here, because of the particular basis
sets we work in, we really have all these terms
being non-0. You can destroy electrons from one
pair of orbitals and create them in another pair of
orbitals. And it's a complicated structure of these
terms that are present in these quantum chemistry
problems.
So how do we do this? Well, we take -- the
particular representation we've been using, there
are other ones which are more compact in terms of
qubits. They actually don't gain that much in terms
of the qubits required, and then they have a large
time overhead. We take two qubits per orbital, and
we just have a very naive representation of the
state on our quantum computer. We have two qubits
per orbital, hence one per spin orbital. So a qubit
up means there's a particle in that spin orbital and
a qubit down means there's no particle in that spin
orbital. So it's a very straightforward
transcription from the states of the molecule where
the electrons are sitting in these orbitals, to the
state on the quantum computer.
We then prepare the system in a simple state, which
the simple meaning just something we can easily
prepare, which has a reasonable overlap with the
ground state. We'd like it to be as good as
possible. There are product states, these
Hartree-Fock states, which I'll say a little bit
more about later, which have reasonably good overlap
for the sizes we're interested in.
Then what you do is you use quantum phase
estimation. Quantum phase estimation is an
algorithm, which given an operator, like this H,
this Hamiltonian H, allows you to measure the
expectation by that operator to project into Eigen
states of that operator and determine what that is.
You repeat this many times. So the state you
initially prepared did not have perfect overlap with
the ground state. So the first time you measure,
maybe you will project on to the ground state,
you'll get the ground state energy; maybe the next
time you'll get a different energy because you
weren't perfectly overlapping with the ground state,
you had some overlap with a different state. So a
sequence of energies out, record the lowest ones
seen, and that's the estimate of the ground state
energy.
In this talk I'm going to ignore all the issues that
would occur if you can't prepare a state with high
overlap. These issues are research problem for the
future. They'll get worse as the molecule gets
bigger.
But currently for the molecule sizes, we've been
doing a lot of simulation. And for the molecule
sizes we can simulate on a classical computer, this
seems really not to be an issue, and we expect it
not to be a real issue for the early sizes we would
be able to simulate on a quantum computer.
So how does this quantum phase estimation work? In
quantum phase estimation, what is the goal? The
goal is, the way it works is you have this
Hamiltonian H, which is a sum of terms that I'll
write as H sub K. Each of those terms, H sub K, is
one of these terms in this expression here. So each
of these terms in this sum are given a choice of P
and Q or given a choice of PQRNS, I'm just
representing it schematically as this H sub K.
So the sum of these H sub K is the Hamiltonian H and
you have unitary E to the iHkt, which describes the
evolution under this Hamiltonian for a certain
period of time.
What you would like to do is you would like to
implement a controlled unitary. You would like to
have a certain extra qubit called a phase estimation
ancilla, and depending upon that extra ancilla, this
determines whether or not you apply this unitary
here. And then you essentially do an interference
experiment. You do an interference between applying
this unitary and not applying this unitary. And the
result then of doing this interference effect, you
take your ancilla, and you prepare it in a super
position of up plus down. You then do this
controlled phase. And what you find is that the
case where you apply the unitary picks up an overall
phase, which is the E to the IHT or E to the IET,
the energy of the state. So you pick up a phase
which depends upon the energy of the state and then
you measure that ancilla. So you're essentially
interfering two different trajectories, and in that
way measuring the expectation by the Hamiltonian.
So in all the plots you'll see later when I start
showing circuits, you'll see this extra phase
estimation ancilla. That's just to control this
unitarian, determine whether or not we apply the
Hamiltonian.
So we now face an issue, though, of constructing
this controlled unitary, which is E to the I times
the sum of the terms. We don't know how to do that
exactly. There's a large literature on different
ways of doing it, to different levels of
approximations. If all the terms commuted, what we
could do is we can do E to the IH1 times T, then E
to the IH2 times T, and so on. And we do have
circuits that will allow us to implement E to the IH
sub K times T for all those terms that I've written
down. And I'll show you those circuits. However,
the terms don't commute. So this will not be
correct.
But a simple approach is the so-called
Trotter-Suzuki approach. And it's based on the
following formula. If you want to do E to the A + B
where A and B are matrices that don't commute, you
can write it as E to the A over N times E to the B
over N to the Nth power. And the error in this
expression gets smaller as N goes to infinity.
This here is roughly E to the A over N plus B over N
in the exponent up to a construction that's 1 over N
squared, and then overall the error becomes 1 over
N.
You can improve a little bit on this with higher
order formulas, and there are even higher-order
formulas. So basically what we're going to do,
instead of -- we wanted to go for some long time to
do this phase estimation. And the time that we need
to do, sitting up here, the time that we need to do
in the time estimation, that time T, is roughly 1
over the energy accuracy that we want to get in the
end. We're trying to resolve very small differences
in the energy, so in order to do that we need to go
for a very long time. And rather than going to that
long time in a large step, what we do is we first go
a little bit under the first term in the Hamiltonian
and a little bit under the next term in the
Hamiltonian, a little bit under the next term. So
we go a little bit under each term and so on. And
that's essentially how the Trotter-Suzuki expansion
works.
So now let me give circuits to do each of these
terms. I said we could do E to the I sub K for each
term. So for example, the simplest term you would
have is A dagger PAP. And this just is a number
operator. It's either 1 or 0 depending upon whether
there's a qubit, whether there's a particle in the P
spin orbital, which is to say it just picks up a
phase depending upon whether or not that qubit is up
or down.
So this one is really quite simple. What you do
with this P, remember, this P will be either up or
down depending upon whether a particle occupies that
spin orbital. So this phase is this phase
estimation ancilla. So depending upon the value of
this phase estimation ancilla, you either do or do
not apply a rotation to that qubit. Because in this
case, ignore the phase estimation ancilla for a
second, E to the IHP A dagger AP times T is just a Z
rotation. You're just changing the phase of the
upstate relative to the down state, so you're just
applying a rotation about the Z axis. So this is
just a controlled rotation about the Z axis in this
case. So that's what you want to apply to do this
operator.
If you want to do a term like HPQ A dagger PAQ, it's
a little bit more complicated. It's a hopping term.
It's represented by a circuit like this. By the way
if there's any questions about this one, the
circuits are going to get progressively more
complicated. So if you're a little bit wondering
about one circuit, please ask me, because they'll
just get more and more involved.
So what do we need to do to do A dagger PAQ? I want
you for a second to ignore these lines sitting in
the middle right here, these two lines sitting in
the middle right here. What this A dagger PAQ does
is it can remove a particle from one of the states
and then create a particle on the other state. Or
visa-versa, remove it from this one and create it in
this one. So we need something that will be off
diagonal. It will change, you know, up/down to
down/up like that. That's what these basis change
gates do. H here refers to a Hadamard gate. So it
produces a Hadamard on each of these that changes
them, interchanges X and Z basis.
Then there is some CNOT. And as I said, don't worry
too much about the stuff in between for a second.
There's a CNOT from P to Q. And the effect of this
CNOT from P to Q is that now Q carries basically the
original value of X on P and X on Q. The Z on Q is
now X on P times X on Q. How did that happen?
Well, this Hadamard turned Z on P to X on P and Z on
Q to X on P, and then the CNOT basically added the
two of them.
Then we apply a controlled rotation and we undo it.
And so the effect of this is to do an E to the IX on
P times X on Q, and that's something that has this
correct off-diagonal property of removing from one
and creating from the other. That's the two X's.
We also do the same thing in a Y basis, which gives
us an E to the I, Y on P, Y on Q. The reason for
this is if we just did this first term we would
correctly get the term where the particle hops from
here to here. We also get a term where we go from
no particles to two particles, and combining these
two cancels it out.
Important point. The important point, these things
I've been telling you to ignore in the middle, what
are these for? There's a sign that's supposed to
enter in. So these particles are electrons. And so
when you move it from P to Q, there's an overall
sign that should come in, a fermionic sign that
should come in. Because interchangeable electrons
picks up a minus sign, and when we rewrite them in
this basis of spins, we have to get those signs out
correctly. And the correct way to do that is pick
any arbitrary ordering of the spin orbitals, and
having picked this arbitrary ordering of the spin
orbitals, you put in an overall minus sign depending
upon the parity of the orbitals in between. So
instead of going from here to here and visa-versa,
you look at all the electrons in between, all the
orbitals in between, you count the parity, you count
the total number in there, and you count whether
it's even or odd; and if it's odd, rather than doing
E to the I theta, you do E to the minus I theta. So
the effect of these things here is we basically take
this one, add it to this one, add it to this one,
add it to this one, and in the end we succeed in
counting the parity, we add it up and count the
parity. So that's how this hopping term works.
The other terms, the PQRS term, HPQRS A dagger P A
dagger QARAS is even more complicated. This
involves -- I'll talk about this one in one second.
It involves four different choices of orbitals, but
it's essentially a more complicated version than the
previous one. There's various basis change gates,
Hadamard gates, there's a more complicated fermionic
string here. To get the signs one, 1 from P to Q, 1
from R to S, so you sort of look from P to Q, count
that parity, look from R to S, count that parity.
Again, there's a phase estimation ancilla, there's
lot of controlled rotation gates. So this is an
incredibly complicated circuit, and that's how you
implement that one. And when I say that's how you
do it, I'm just quoting the standard circuits that
you could look up in the literature at the time we
did our first paper. And similar circuits are done
for this PRRQ term, which is -- again, involves four
operators, but two of them are the same. It's like
a controlled hopping, depending upon whether a
particle is on R, then you can hop.
So what's the problem with this approach? Well, one
problem with this approach is total number of terms
grows roughly as N to the 4. If N is the number of
spin orbitals. Because a large fraction of these
HPQRS are non-0. So you have a huge number of turns
in the Hamiltonian. Each term requires enforcing
the fermionic parity. These Jordan-Wigner strings,
these Jordan-Wigner strings are simply these strings
of CNOT gates to get the parity right, and that's
proportional to N in general. You know, typically
the two things will be a distance N apart. So
that's a factor of N.
Then you might ask what's the Trotter step required?
How small do we have to take this -- you know, I
said we go a little bit on the first term, little
bit on the second term, and so on. How small do we
have to take it? According to the most naive
bounds -- I say most naive, but they were the ones
in the literature that you would quote -- this would
be, again, actually the three-halves power the
number of terms giving us an N to the 6. And if you
add it all up we get an N to the 11th time, which
is a rather big scaling. So we're going to improve
it a lot.
So this relies a lot on both analytic work and also
numerical work. Here's some of the molecules
simulated with liquid, which Dave Wecker did a demo
and talk about yesterday. It's really been
invaluable both in allowing us to do simulation of
this algorithm for small molecules, which we can
simulate exactly and gain a much better
understanding of the error effects. And also in
understanding how to change the circuits. Because
liquid has allowed -- really you can make a change
in how you do some of those circuits I've just
shown, and very quickly see how is the gate count
changed. And it's sort of funny that some things
that are like seemingly trivial changes, for
example, how do you order your spin orbitals? Do
you order it from orbital 1 up, orbital 2 up,
orbital 3 up, orbital 4 up, and then orbital 1 down,
orbital 2 down, orbital 3 down, so on, 1 up, 1 down,
2 up, 2 down, and so on. That's actually a fairly
large constant, but fairly large constant factors
speed up doing sort of seemingly trivial things like
that. And somewhat surprisingly, most of the
seemingly trivial ones were done in exactly the
wrong way in the literature beforehand.
So any way, I won't go through this table. This is
a table from our first paper. The important thing
is that even with rather optimistic estimates of how
fast it would take to execute a gate, if you applied
these upper bounds for the circuits and so on, you
got millennia to solve the problem. You got
enormously long times. So I won't focus on the
improvements to the problem.
So I'm going to start by talking about the first
improvement. I have several different improvements,
so I'm going to go on a couple different topics.
The first one is getting rid of those Jordan-Wigner
strings, which this is going to be one factor of N
removed. So we're going to mostly focus on these
two bodied, the ones involving four fermions. Those
are the ones with the most terms. The two body -the one-body, two-fermi terms. Also you can do the
same improvement trick. It's not a series, but I'm
just going to show the plots. For these ones, the
circuits for these ones, just because that's where
most of the terms are.
So first thing to note is we can -- this is the
traditional circuit. We can rewrite the circuit in
a certain way so that all these Jordan-Wigner
strings appear outside the basis change gates. That
is, rather than executing the Hadamards and then
doing all this, you can interchange them and replace
this CNOT with a controlled Z and it gives you
mathematically the same structure. This might not
seem to be too much I've just interchanged from this
one to this one and the total number of gates is the
same.
However, the advantage is if you go back to what I
had to do previously, I had to do this whole mess.
First the Hs then the Y, then the Y. There's a
whole bunch of different things. H, H, H, H, H, H,
Y, Y, Y, Y, H, H, Y, Y, and so on. Now, by moving
the -- and on each of those, on each of those, we
had that whole string. So we did the basis change
then we did the string, then we undid the string,
then we undid the basis change, and we did another
basis change, then we did a string, and so on. By
moving the strings outside, we only need to do them
once. We do the string, we do a basis change, this
thing, then in here sort of dot, dot, dot, repeat
the circuit with those Hs replaced by Ys and so on.
And then those Jordan-Wigner strings just sit
outside. So that's a constant factor improvement,
but it's a pretty big one.
And then what we can do, though, the crucial thing
is that if we lexicographically order the order in
which we do these strings, so we do a given PQRS,
and then we do PQHS + 1, and then we keep increasing
S until we reach the maximum possible value of S and
then we increase R by 1 and so on. If we do this in
lexicographic order, there's a lot of cancellations
possible. So I've just drawn one of those circuits
from the previous slide. I just drew a single basis
change inside here. Then here comes the string
coming out, then the next string coming in, then the
same thing.
But if you look at this, you stare at this, most of
this can be canceled. This is undoing the string
from 1 and then redoing it for the next one. And
what you see is you have a CNOT from this on to this
and a CNOT from this on to this and a CNOT squared
as 1, so I can drop that. And the CNOT squared here
is 1, so I can drop that. In fact I can drop all of
this stuff right here, except for the little bit at
the very end. So I've removed a large portion of
the Jordan-Wigner string by doing this.
One way to think intuitively about what that is is
actually a very simple interpretation of that. I
mean it seems like a mathematical trick, but
intuitively what it means is I need to count the
parity of these sites in between. After I count the
parity in between, you know, add it all up, 1, 0, 0,
1, 1, so on, when I go to the next term in the
sequence, I don't need to recompute the parity of
most of it, I just need to see how the parity
changed. So actually really need to just do one
extra CNOT. You can improve this a little bit more
if you add an extra ancilla that keeps track of the
fermionic parity, so that rather than keeping track
of the fermionic parity by doing a running sum in
here, you pass it to the ancilla and come back.
This allows even more reordering and more
cancellations possible.
Another advantage of the ancilla is it allows us to
do a trick called nesting. The effect of the
ancilla is that you might have one term that acts on
a particular set. For example, this set of qubits
and this set of qubits. These would be the four
acted on, and we've just drawn it several times to
indicate, you know, that they'll be the H, H, H, Y,
Y, Y, Y, and so on.
But you might have another term that acts on these
two and these two. This one moves the qubit from
here to here, but does not change the parity of this
string -- sorry, moves an electron from here to
here, but does not change the parity of this string.
This one moves from here to here but does not change
the parity of this string. So we actually can nest
in such a way that we can execute them at the same
time. We can improve our parallelism. That is
already -- liquid was looking for certain kinds of
parallelism, it would realize that if two gates
acted on different qubits you could execute them at
the same time. You could just push everything to
the left as far as it could go until it hit some
other gate acting on that qubit. But here we take
advantage of the fact that even though the gates in
this circuit don't commute individually with the
gates in this circuit, there are some ways in which
they commute as a whole that allows us to execute
them simultaneously. This actually gives another
factor event in terms of reducing the depth,
reducing the parallel depth.
So here's a figure which is sort of almost
deliberately meant to be hard to read. This is
the -- this is an example of an HPQRS circuit for -sorry, it's a circuit that does three different
HPQRS's. One for particular choice of PQRS, another
particular choice, another particular choice, and so
on. It just shows the circuit parallel depth is
151. It's not completely obvious from the slide
because you go, wait, that one and this one are not
aligned above each other, but when that depth is
computed, the 151, everything is slid as far as it
could go to the left. It's clearly a complicated
circuit, but it's sort of complicated in the sense
of the same structure repeated many times and
different sets of the strings.
After we do the nesting and the cancellation, the
parallel depth gets reduced, in this case only by a
factor of 3. This is just a few of these circuits
here. And this circuit is sort of some crazily
complicated thing that you would never have been
able to come up with by hand. This multiple
controlled rotation being executed at the same time,
you know, many of the gates are being executed at
the same time, enormous numbers of the CNOT are
canceled. A lot of these CNOTs at the start would
have been canceled if there was stuff done before;
they would have been canceled against the circuit
being done before and so on.
So this leads to a large improvement. And this is,
again, the kind of thing that you can really only do
if you have, essentially, liquid acting like an
optimizing compiler in this sense. We basically
told it the rules as to how it could do these
manipulations, and then it was able to find these
reductions. There's still probably more work to do,
because we're not currently doing optimal ordering
of the terms for all these reductions, and we expect
we can further improve.
So we got a sequence of improvements in the gate
depth. These are showing various numbers of spin
orbitals for the thing, various -- each point is
some particular molecule. Sometimes you'll see the
same molecule twice, like CO2 large, CO2 medium,
different bases for the same molecule. And by doing
all of these, we get some significant reductions in
the number of gates required. And that's just the
first part. So that's this first sort.
Second part, I want to tell you about the second
kind of gain we got was what we call interleaving.
Interleaving is a way of reordering of the
Trotter-Suzuki expansion in order to reduce the
simulation error. We don't really have -- we have a
strong physics understanding of why it's true; we
have good numerical evidence that this greatly
reduces simulation error. We do not have
mathematical proof as to exactly how this reduces
the simulation error.
So what kind of terms do we have? Well, we have
these terms, HPP in the problem. These are diagonal
terms that are just the number terms. And PRRP in
our notation, that's A dagger P, A dagger R, ARAP.
So it's just a diagonal interaction between two
sites number/number interaction. These terms
commute with each other. These terms completely
commute with each other. And these HPP, these are
actually the largest terms in the problem, in terms
of their scale.
Then you have a bunch of terms HPQ. These are the
hopping terms. And then you have these controlled
hopping terms, HPRRQ. And then I'll get to the PQRS
in a second. These PRQR controlled hopping from P
to Q. And in a particular basis we work in, you
have a lot of flexibility as to what single particle
basis you can work in. But these packages will give
you a basis in which the following identity holds.
And this is the Hartree-Fock state. HPQ plus the
sum over all the occupied Rs, the occupied Rs in
this initial product state that we were trying.
PHPRQ is equal to 0. So what that says is in this
particular product state, you know, you have some of
them occupied and some empty at the start. There's
an ability to hop from one to another. But then
because of the control of the other occupied sites,
that exactly cancels that tendency to hop.
So when this Hartree-Fock state, there's no -Hamiltonian doesn't create any single particle
excitations. The Hamiltonian acting on it, you'll
never go from this state where these are all
occupied and these are all empty and hop one over.
You can create two empty ones and two occupied ones,
but you never create just one excitation acting on
it what the Hamiltonian wants.
So this is a particular cancellation, and they
picked this because this gives a very good starting
point for a lot of calculations. And these terms
all commute with each other. And one of the effects
is we can group the terms with each other. What we
do is we first execute all the HPPs and PRRPs, those
are all diagonal. So first we do all those terms.
They all commute, we can do them in any order with
respect to each other. Then for each PQ, for HPQ,
we do all HPQ and HPRRQ. Those all commute with
each other for a given PQ, so we can do those with
any order in there. And they tend to cancel out on
average. Previously we had done them in different
orders. We had not done this necessarily right next
to this. And so we're getting a lot of extra error
because we had terms that on average tended to
cancel out. But we do one and undo it, and
depending on what they commute with in between, get
a much bigger error.
So by doing this grouping, doing these then for each
PQ doing these, and then finally doing HPQRS, we
reduce the error by a large amount. What does it
mean reduce the error by a large amount? Here's the
error. This is the error just relative to the exact
ground state and the estimate of the ground state
energy. This is a function of Trotter number, which
is the inverse of the time step in this case. And
this is for -- simulated for water. This is the
standard lexicographic, meaning order everybody
lexicographically. This is how the error goes down
with the Trotter number. And this is what
happens -- ignore the diagonal fix. It's an
interesting thing but I'm not going to talk about it
in this talk, even though it will appear on the next
slide. I don't think I'll have time. But this is
this large dropoff when you go from this spot to
this spot. The error enormously drops and it gives
you an over ten-fold reduction in the Trotter number
required in order to get the same accuracy.
Why is this important to reduce the Trotter number
required? You have to get to a certain total time.
You have to get to the time that's one over your
desired energy accuracy at the end to do the phase
estimation. Obviously if you can do it in bigger
jumps, you get there with fewer gate steps, whereas
if each gate step is only a small amount, it takes
you more, more gates to get there.
So there's a particular kind of fun thing that's
like a renormalization group improvement to it,
which I'm going to skip the details of. But one
thing I do want to mention, which will be useful for
the next slide, is that we define sort of an
effective diagonal technique, which will be useful
in a second. This effective diagonal energy. This
effective diagonal energy is in the same spirit of
Hartree-Fock. You can say you might think that
there's this HPPs that cost to put an electron on a
site or take on electron out of a site, but roughly
you should also add on its interaction with all the
other sites that will tend to be occupied. And on
average that will be something like this. So this
omega P is kind of a guess as to what the binding
energy of some particular orbital is on average,
what the guess is to how the other sites is
occupied. And this will be useful for the thing I'm
about to talk about now.
The thing I'm about to talk about now is a
multi-resolution Trotter formula. What this means
is that there's perhaps no need that every term
needs to be executed with the same time step. You
could execute some terms more frequently and some
terms less frequently. So instead of every Trotter
step doing a little bit of each term, you take the
big terms and you do a little bit of them, a little
bit of them, a little bit of them, and you have to
do them a little bit at a time because they're so
large. But then when you hit a small term you just
do it all at once. You do it to quite a large
amount. And then again keep going like that. So
that way a lot of terms -- and there's huge numbers
of small terms. So a lot of the terms you could do
much less frequently and this would lead to, again,
a large speed up.
So our original idea was that we would exactly just
use the magnitude of the term as the factor. And it
turns out that in practice this did not work. All
kinds of things were tried, and it just in practice
did not lead to any improvement. You could try,
okay, what we called the coalescing value, you know,
coalesce a term by a certain amount, the small ones
will coalesce more. And we never got to a situation
where reduce the work without increasing the error.
And largely this was due to thy's incredible
persistence in trying to keep looking at different
ways of trying it. And eventually we came up with
something that works and that makes a lot of
theoretical sense, although, again, sort of makes
sense from a physics point of view, but no real math
as to exactly why it works.
Instead of ordering of the terms just by their
actual magnitude, we ordered it by the magnitude
squared divided by some energy denominator, sort of
in the spirit of second order perturbation theory.
And the energy denominator was obtained from the
difference of these omegas. So by doing that, this
gives us another meaning of the importance of the
term. And I'm just going to call this importance,
this quantity here. And we sorted in this way. We
sorted by this importance. And then the terms of
high importance get executed very frequently; the
terms of very low importance get executed less
frequently. We obtained at least a ten-fold
reduction in gate depth.
When I say at least, this is a little bit tricky to
define, meaning that we came up with some rules that
work for every molecule we could test. We don't
have a quantum computer, so we are limited in the
size we can test; but we came up with rules as to
how much to coalesce based upon the importance and
based upon the distribution for that molecule, how
much we could coalesce for a given molecule. And
those rules were giving a ten-fold gate reduction at
the sizes we could simulate. We believe that we
could simulate -- coalesce even more aggressively at
bigger sizes. We just -- it seems to make sense,
it's very believable; we don't have the ability to
really check it by simulations. Perhaps one of the
things we would do if we had a quantum computer, we
might run a 50 qubit molecule, verify that this
thing really worked at this size, and then trust
that it worked at 100 qubits or something. But I
expected that 10 is actually a rather conservative
statement, that it's probably going to be a lot more
than that.
So the full set of rules, for example, that we used
for hydrogen chloride, we have some terms of very
large importance here. And then these terms here,
these B, there's a few terms in here which it's a
little subtle how you handle. So these terms get
done every time. Then you have terms that you do
every 16 times, 32 times, and 64 times, as they get
less and less important. And so again, another
thing we can potentially do is we can start
coalescing more and more by not just 64 times but
128 times, and so on.
And the important thing to see, here I have a plot
of these distributions of this importance for a
variety of different molecules. And one of the
interesting things is that as the molecule gets
bigger, like as you get up to this Fe2S2, what
happens is that there start to be sort of a very few
terms, way off on the right, of very high
importance. So those are the ones that you have to
do every time. But then everything else is much,
much less. So you would expect that you can get an
even greater coalescing gain. So this is something
that we need to understand more in detail. But our
current simulations are showing -- so here's this
plot showing reduction in work for a variety of
molecules, according to these rules. And I expect
that it's going to continue to increase beyond
there.
Again, skipping a little bit quickly, because this
is a talk I gave in Santa Barbara I gave in an hour
and I'm giving in 40 minutes here. So I'm skipping
over parts of some of the slides.
So what are the improvements we've gotten to? Well,
we started, as I mentioned, with this power of N to
the 11th. And then I say there's also a factor of
10 to the 4 for phase estimation. The question is
sort of there's a cost of a single step of Trotter,
and then there's a question of how many Trotter
steps you have to do, what's the total time you get
to? Are you getting to a total time 1 or are you
getting to a much bigger total time? So that gives
you some constant overhead if you're aiming for a
constant target accuracy.
The improvements were canceling Jordan-Wigner
strings, nesting, reordering some stuff,
interleaving, and then this coalescing trick. And
the bottom line is a asymptotic improvement by N
squared, lots of other improvements that we don't -we can't give in N squared. Like I don't know if
coalescing is N, N squared, or what, but seems to be
large. And the final scaling is going to depend a
lot upon what's the Trotter number required. So
this is the last thing I want to talk about and is
very important in understanding the overall scaling.
Those bounds showed that the Trotter number -- those
bounds showed were very pessimistic, I mentioned,
this N to the 6, and they had a Trotter number
acquired increasing rapidly as the molecule size
went up. However, simulations on a variety of
different molecules do not show the error going up
in any way as the Trotter number -- as the molecule
size increases. These plots of errors is a function
of Trotter number for a variety of different
molecules. There is a lot of structure in there and
detailed stuff, but there's no clear trend with the
number of orbitals. And that includes also
simulations on smaller ones, there's no clear trend.
So I want to talk a little bit about better bounds
on errors. Previous work, as I mentioned, just use
like the number of terms and so on. We have a
better bound, which expresses the error directly in
terms of sums of norms of commutators. So
commutator of this term with the commutator of two
terms with each other here, so like HPQRS, with
HABCD, with HEFGH. We can just directly evaluate
this bound and see what we get. This relies on
really a specific bound. It's not just good to -actually, I think I have this all in here. There's
a bound that's not just good to lowest order; that
is, there's some errors. There's one way of doing
the error now, which I'll show in a slide or two,
which is based on expanding the error out to order
DT squared. But this is actually a bound that's
valid not just as low-ordered expansion, but holds
to all orders. And you can just go directly, try to
evaluate what these norms are.
One of the things you find then is that because it's
in terms of norms of these commutators, if this
involves four different orbitals -- you know, four
different choices of orbitals, so there's sort of N
to the 4 when you count this, in order for it not to
commute with this term, one of these has to overlap
with one of these. And so there's only actually N
to the 3 possible choices of this term. And again N
to the 3 possible choices of that term. So that's
one of the improvements that many of the terms
commute with each other.
This gives one of the improvements in there that
when you take into account this is where the N to
the 10 is coming from, and then N to the 5th.
That is, there's actually only N to the 4 choices
for the first term, N to the 3 for the second, N to
the 3 for the next, gives a total of N to the 10
possible choices, and this gives you a Trotter step
scaling as N to the 5th. Trotter number scaling
of N to the 5th.
However, another important improvement is that in
fact as these molecules gets bigger, the number of
these terms gets larger. But the number of big
terms does not get really larger. What happens is
you get a lot more terms but a lot of them are
small. So for all of those small terms, actually
they don't contribute that much to the error. So
this gives you another large improvement because
when you actually go plug these numbers in, either
in the previous, all orders bound I showed, or in
this bound here, which is a lowest order expansion,
they actually roughly agree up to constant factors.
This is a much better constant factor, because it's
just the lowest order one. The other one is all
order, so it's actually worse than constant factors,
but they give basically the same estimate of error.
When you plug the numbers in you find that many of
the terms in the commutator are actually very small,
just because that particular term in the Hamiltonian
happens to be very small.
There's some further averaging effects -- and I'm
down to three minutes, so I won't go over that -which even further improve it, which is that you
have all these errors terms as an upward bound.
That's very pessimistic. We would expect in reality
that they're going to kind of average out.
So what I want to -- so this leads to many further
improvements. And our guess is the Trotter number
is going to be around either N to the 1st or N to
the 2nd at most. Really quite slowly going.
So I want to conclude with a guess of how many -what would be the time required to simulate a
molecule like Fe2S2. I mentioned this molecule.
It's a basis of an interesting size, a size that's
not a ridiculous number of logical qubits. But
outside the range of what a classical algorithm can
do. Certainly outside the range that any exact
classical algorithm, based just on diagonalizing
Hamiltonian or length choose methods, would ever be
able to do, and it's been used by IARPA as a test
case. And this is going to be done a little bit in
the spirit of a fermion problem like counting how
many piano tuners there are in Chicago. I'm not
going to get the exact number, but it's just -- this
is just some guesses.
Okay, so what do we have? If you go back before,
about 10 to the 7 is the gate depth per Trotter step
on this. So it's actually currently like two times
that or something, but I think we can probably
reduce that a lot by better term ordering.
If we say we want milliHartree accuracy -- Hartree
is about 27 electron volts. If we say we want a
milliHartree accuracy, we think we need a Trotter
number of about 10. That's based on the scaling
bounds, the Trotter numbers that suffice for smaller
molecules and so on. This maybe is more than 10.
It's certainly not more than 20, I would be quite
confident. Or that is to say, a time step of about
a 10th of a Hartree, a 10th of an inverse Hartree.
And then what's the total time we need? We need a
total time of about 1,000 inverse Hartree to get to
milliHartree accuracy.
Coalesce by at least the factor of 10, probably
more. So a bunch of these are, you know, you might
say, well, you're not really 10 to the 7 right now;
you're thinking it's a 7, but you're 2 times 10 to
the 7. But I'm sure we're going to coalesce by much
more than a factor of 10. Here's the extremely
optimistic number. We'll assume that the gates take
a logical time of 10 nanoseconds. If you want to
plug in a different number, just multiply my final
answer by whatever you plug in there. And what you
find out is that it would be roughly 100 seconds.
If you want a micro Hartree accuracy, that would be
much lower.
But this is actually a very interesting -- very
short timescale. You know, even if it were quite a
bit larger than that, that saying that you can do,
you know, even if it were not minutes, which is 100
seconds, but hours or days, you're able to do very
interesting and very useful things you would not be
able to do in any other way.
So I just want to conclude -- you can read the
conclusion, but I want to conclude with one
interesting estimate. That 100 second timescale, if
you take that energy accuracy and convert that
energy to frequency with Planck's constant, and then
convert that to a time, you get asked how much
slower are we than nature, and you find that we are
about 10 to the 12 times slower than -- sorry, 10 to
the -- yeah, 10 to the 14 times slower than nature,
sorry, 10 to the 14 times slower. That would be -and I don't know if that's a lot or a little. It's
sort of interesting just to think about.
>>: The more you can go. [laughter]
Matt Hastings: Okay. Thank you.
[Applause]
>>: So you discovered a lot of the problems
specific for optimization that you did. Can you
infer from this that some of these optimizations
might also be useful for other classes of problems?
So in the world of designing quantum circuits or
quantum algorithms we could have problem
optimizations that are not problem specific,
problems that are optimizations that are problem
specific, and maybe some special classes of
optimizations?
Matt Hastings: Yeah, that's a great question.
Actually I think I really sort of did take that
lesson from -- I think we all took that lesson from
it. It's great. Being able to actually test this
stuff, then you can see how much sort of
[indiscernible] on minor improvement does nothing,
then something as theoretically great leads to
essentially no improvement. You know, it's really
fun to think about, but doesn't lead to much at all.
But some other ones lead to large improvements, and
then being able to check it out. I think there is a
large amount to be done there and I think, you know,
we need to maybe start taking more of that approach
and using simulation to do that.
>>: One other question. So you know in some
scenarios basically [indiscernible] right, so
actually we can [indiscernible].
Matt Hastings: Yeah, no, that's a good question.
The CNOT string does not reduce the total number of
single qubit rotations required. And this certainly
might be the most costly. Depending upon your
platform and other platforms like the Fibonacci, the
CNOTs themself will also be kind of hard to do. But
certainly many platforms the rotations are the
costliest part. However, the CNOT reduction in that
is what allowed us to do the nesting. And that will
lead to a reduction in parallel depth for the single
qubit rotation, which would still likely be useful
on many platforms. So that's reduction there. And
then otherwise all the other improvements having to
do with term ordering and coalescing will be those
directly reduce the number of single qubit rotation
by the same amount. So that would help that too.
>>: So the empirical reduction in the error for
given number of Trotter steps, were you able to
fully explain using the shortcuts in the
improvements or is there some kind of
[indiscernible]?
Matt Hastings: The empirical reduction in Trotter,
were we able to explain ->>: The simulator, the math does back up exactly
what we saw.
>>: So he's wondering if there's maybe more head
room that isn't explained yet theoretically that we
saw in the simulator. We're not big enough in the
molecules to really get any closer. I think we're
about as close as we can get.
Matt Hastings: Yeah, I mean currently those bounds,
the upper bounds, will give numbers that are
significantly higher than the actual error. So,
yes, there are some issues which are like if the
bound tells you it scales this way and you're down
here, it's possible that it's going to come up and
then hit it and start going that way, so you might
wonder. But actually the reality is that those
bounds scaled in a certain way are empirical stuff
is not only lower, but scaling better. Yes, I think
there is more probably to understand, but we're
certainly, you know -Krysta Svore: Great. Let's thank Matt again.
[applause]
Download