>> Krysta Svore: Okay so welcome everyone. Today we... visiting the [indiscernible] group. He is a candidate for...

advertisement
>> Krysta Svore: Okay so welcome everyone. Today we have Nathan Wiebe
visiting the [indiscernible] group. He is a candidate for a researcher
position in [indiscernible]. He is currently a post-doctoral fellow at
the Institute for Quantum Computing at the University of Waterloo.
Today he is here to talk to us about progress towards a killer
application for quantum computation. So, Nathan thank you for coming.
>> Nathan Wiebe: Great, thank you very much for the introduction
Krysta. So as Krysta said I am going to be talking to you today about
the quest to find a killer application for quantum computation. The
idea for this talk came from a question that was actually asked to me
by two different colleagues at two different conferences. This
question is, “If I was given a quantum computer with 100 “qubits” in it
what would I use it for”?
And this question got me thinking and made me wonder whether we have
actually found something that realistically is a killer application for
quantum computation yet and if not what would such an application
ultimately look like? Now the notion of what a killer application is
to some extent sociological. It depends on what problems you think are
interesting or what problems you think other people find interesting.
So in order to preface the opinions that are contained in this
presentation I really think its necessary talking about the sorts of
research problems that I personally find interesting.
Research touches a bunch of different areas involving quantum
simulation, adiabatic quantum computation, quantum algorithms,
numerical analysis, and foundations of physics and Bayesian inference.
So it might come as a bit of a surprise given this, that to me, my
vision of a killer application looks something like this. This is
Visicalc, a program that is widely regarded to be the first killer
application for home computers. This simple application single
handedly justified the purchase for many people of a very expensive
home computer in 1982.
So from my perspective something it ought to call itself a killer
application for a quantum computer should spark people’s imaginations,
at least as much as Visicalc did. If it doesn’t then I would not
consider that to be a killer application. So in this talk I am really
going to put candidates forward for potential killer applications and
then discuss whether or not I feel that these are adequate or close to
being killer applications. And these two candidates that I am putting
forward are: Quantum Simulation and Quantum Data Fitting.
The outline of my presentation will involve an introduction to quantum
simulation and then I am going to present a new method based on nonunitary simulation that was developed by myself and Andrew Childs that
improves upon existing methods used in, for example, quantum chemistry
simulations. Then I am going to discuss data fitting. In particular I
am going to give an introduction to the linear systems algorithm on
which the data fitting work is based. Then I am going to present the
algorithm, after which I will conclude and discuss whether either of
these actually is a killer application and if not how close we are.
Quantum computation really began with thoughts from Richard Feynman and
Yuri Manin that suggested that quantum information could be stored and
manipulated in quantum systems themselves. Now this actually doesn’t
sound on the surface like all that deep of an insight after all.
Morris law has caused us to miniaturize to get components smaller and
smaller, this does sound like it may just be a replication of the same
theme, just smaller.
But, actually it’s a fundamentally different idea. The reason why is
because of the fact that the information has to be stored in a
manifestly quantum fashion. That is in a way that defies a classical
description. So the way that this is generally done in quantum
computation is that we have bits that are then generalized into quantum
bits or qubits. And unlike classical bits these quantum bits can be an
arbitrary super position of zero and one.
So the difference between this and maybe a full analog computation,
well one of the differences at least, is the fact that when you measure
this you get a discrete outcome. So every time you measure the quantum
bit it probabilistically will either be 0 or 1 and the probabilities
depend on the amplitudes of A and B. You could of course repeat the
same idea with more quantum bits. So with 2 quantum bits your system
can now be in an arbitrary superposition of 4 different values. And
obviously if you imagined that if you had 3 qubits then it can be in an
arbitrary superposition of any of 8 different combinations of 0 and 1.
So that’s how this works. And of course a quantum computer also has to
be able to transform these quantum bit strings, at least in principal,
into any other possible quantum state within the quantum computer. So
the question is: were does quantum computing potentially get its
advantages over classical computation”? And there are many different
candidates that one could use for this. But, to some extent one of the
biggest things is this exponential growth of the amount of information
that’s required to represent the quantum state.
To give you an idea about how these exponentials ends up scaling I have
given the following diagram. Imagine that you had a single quantum bit
and you could write down all the information you needed in order to
represent that quantum bit within some fixed error tolerance on a
single grain of sand. Then of course the relevant unit is, “Well how
many qubits would you need in order to make a pile of sand the size of
a Tyrannosaurus Rex”? And that turns out to be 48 qubits. If you want
to look at a quantum computer that has twice as many qubits, 96 then
the pile of sand you would need to represent that quantum state would
have a volume of about a small moon. Going up, again a factor of 2,
that pile of sand would have to have the size of a small planetary
nebula.
So, obviously this problem that I began with is this question of what I
would do with a 100 qubit quantum computer. This is not a trivial
computational device. This device would require a lot of classical
information in order to specify what’s going on inside the system. So
you ought to imagine that quantum computation should be useful for a
host of different problems. And a number of algorithms are known for
which speeds up over the best known or best possible classical
algorithms exist.
And of these I am going to focus only on those that offer superpolynomial or exponential speedups. The reason is that obviously we
really want something that is much better than a super computer. So of
these there are a number of examples. And the two that I consider most
promising are of course quantum simulation and linear systems
algorithms. And I am going to be discussing those in detail later on.
So let’s start with quantum simulation and the quantum simulation work
that Andrew and I have done. So the idea of quantum simulation is
actually really straight forward at its core. What you have is you
have a quantum computer which intrinsically is a quantum system. And
you have the ability to perform arbitrary transformations on the
quantum state inside that quantum system. So in principal other
physical systems are themselves quantum systems.
So you would imagine the transformations that naturally, say a chemical
system is undergoing, could be performed on a quantum computer on the
logical level. Similarly the quantum transformations that occur in
condensed matter systems ought to be emulated by a quantum computer of
an appropriate size.
And this is the idea. You basically use the fact that a quantum
computer is a universal device to emulate the natural quantum
evolutions that exist in other systems. And you can actually do this
efficiently whereas on classical computers there is no known way in
general to simulate arbitrary quantum systems efficiently. So you can
do some kind of amazing things if you were able to get 100 qubit scaled
quantum computer.
You would be able to simulate reaction dynamics, and learn spectral
properties of chemicals without ever having to synthesize them. You
could use these devices in order to investigate models for exotic
superconductivity or look for quantum phase transitions in certain
condensed matter systems. There are many different possible
applications for hard problems scientifically that quantum simulation
could be used to address right now and important ones too.
>> Will there be some questions that you cannot answer in a regular
ordinary way of [indiscernible]?
>> Nathan Wiebe: So what do you mean, sorry?
>> You said you can answer some questions about chemicals without
synthesizing them.
>> Nathan Wiebe: Right, right.
>> But I would imagine there are some questions that you cannot solve
even by synthesizing these chemicals.
>> Nathan Wiebe: Right, right. I was specifically was referring to
spectroscopic properties. I mean obviously if you synthesized it then
you could just throw it through whatever spectrometer you wanted and
then look at that. But you couldn’t get information about reaction
dynamics or maybe more sensitive pieces of information from it.
So one of the key things that you can do with quantum simulation is
that actually if you do this on a quantum computer a quantum computer
can perform error correction and it can actually certify that the
answer that you get out of this quantum simulation is actually valid.
So that’s one of the very neat things that you can do with quantum
simulation that you may not be able to do with other approaches to try
to build analogous experiments say to one of these systems.
And also the number of logical qubits that you need in order to equal
or exceed the power of a conventional super computer isn’t very big.
You only need something on the order of 40ish qubits in order to start
getting to the regime of existing super computers. And also unlike
many of the other quantum algorithms that have been proposed so far
this family of family algorithms solves a host of problems that
scientists actually care about deeply right now. And a lot of
processor time is spent currently on solving these sorts of problems.
>> So can you clarify what you mean about error correction and
certification? Maybe you can give us an example of what you mean by
that.
>> Nathan Wiebe: You know what; actually the next slide probably gives
me a better prop for describing this. So --.
>> [inaudible].
>> Nathan Wiebe: Yeah, sure Matt.
>> Maybe you are going to talk about this in a bit because you said you
were going to talk about [indiscernible] theories. Two steps I was
curious when you mentioned simulating small molecules on the computer.
The first question is that on a quantum computer we can simulate time
dynamics, but we might be interested also in having to cool the
molecule to its ground state and this might [indiscernible]?
>> Nathan Wiebe: Well there are several ways. Technically there is
actually no overhead with bath qubits that are required because you can
always in principal simulate an adiabatic evolution to cool it into its
ground state and you don’t require any additional qubits for that. The
bath, for the question about the bath, the worst case scenario you
require one additional qubit. You can effectively do a single qubit
phase estimation algorithm and use that to cool it.
>> And then a separate thing is whether you might try to do, well I am
just wondering how you are going to encode [indiscernible]? You know
100 qubits in terms of the spin system that’s great. That’s like 100
spins, but in terms of a system of electrons that’s not actually that
many degrees of freedom. I mean if you have [inaudible].
>> Nathan Wiebe: Sure, I mean it really depends on the way that you are
doing this. So for example if you are talking about simulating a
quantum system in some first quantized form then the way that you would
have to probably encode that is you would begin by saying, “Okay, well
the electron cloud is distributed according to say an S orbital at the
beginning of that”. Then you would encode your information efficiently
with that. And then you would require a quantum algorithm to prepare
the initial state as an S orbital and then evolve in first quantized
form.
That of course isn’t terribly efficient. A more efficient way of doing
it is too, if possible, map it to a second quantized form and use
creation and annihilation operators to deal with that. Then you only
just have to abstractly mention the state at the beginning of the
algorithm that the electrons are in.
>> So I am just wondering, and maybe we can talk about this later, in
terms of first quantized, second quantized, the [indiscernible] sounds
huge, but at the same time when you ask how many orbital’s electrons
can have and you count the number of possible spaces for you in a
fairly small [indiscernible] it’s already getting up there to
[indiscernible].
>> When you get into a second quantized form you only get 50 orbital’s.
At best you still need to leave some things off for phase estimation
[indiscernible]. So that’s the size we can still do classically.
>> Right, so I am wondering --.
>> You are [indiscernible] on the edge, but the number I tend to use is
a good 200 at least because now you are beyond what you can do
classically.
>> Nathan Wiebe: Yeah, for these sorts of 40-100ish qubits is certainly
for spin systems. That’s going to be where sort of the state of the
art is I think. Whereas yeah, there are certainly overheads that come
in with the chemistry simulations, but we can discuss that later.
So going back one more on the stack to the question about the
certification and why I was bringing that up. One of the big trends in
experimental physics these days is to look at so called analog
simulators. And this is justified by the fact that building a full
quantum computer seems to be a very difficult thing to do. But quantum
mechanics is just something that happens naturally in the lab. So why
not just construct a gigantic lattice of hundreds interacting atoms and
call that a quantum simulator? Well, actually it is quantum and it is
simulating something.
But the question is it simulating something other than itself? And if
you want to use this in order to answer a computational question what
do you need to do? In this case certifying this system of several
hundreds qubits that was taken from the ion trapped group at NIST what
you need to do, at least on a naive level, is you need to first
understand what the state is and that may require tomography, which is
going to be exponentially expensive. So there is no known way of
getting these sorts of simulations which are already very complicated
and very sophisticated to actually solve genuine problems.
So that’s one of the reasons why I emphasize the certification issue,
because you know this is false dichotomy where you say, “This digital
simulation experiment that people have done is only 6 qubits, why are
they bothering when you can do a few hundred qubits”? Well the reason
is because to some extent you can trust what the output it of these
sorts of experiments. Whereas with this one who knows.
>> I see, so what you are saying is because you can break a quantum
computation up into very small parts and then do an algorithm then you
can certify. Rather than having a large system that you just say does
something.
>> Nathan Wiebe: You just pray that according to your understand of the
laws of physics that it ought to be the system that you think it is.
>> As just a side note, Burt and I have been looking at tomography as a
[indiscernible] map. [indiscernible].
>> Nathan Wiebe: I am going to get to that point with my data fitting
algorithm later.
So that’s, enough said.
So let’s talk about basically how this emulation process inside a
quantum computer actually works out. There are several steps. The
first thing is what you have is you have this quantum system and this
is the thing that you would like to simulate. And you have your
quantum computer, which is the device that you are using to simulate
it. In general you are going to require more qubits for your quantum
computer then the quantum system, but for spin systems it turns out
they can actually be the same size and [indiscernible] often.
So the way that the algorithms begin is you want to simulate the
evolution of some initial state forward in time to a final state in the
actual quantum system. You begin by taking that state and encoding
that as a qubit state inside your quantum computer that’s logically
equivalent to the initial state. Then this continuous time evolution
that you would see in the physical system you approximate by a series
of discrete gate operations, which map the system to a final state.
And that final state ought to be logically equivalent to the evolved
state up to some error tolerance. And that’s how these algorithms
fundamentally work. And if you design this right then you will be able
to know beforehand what that error can be in a worst case scenario.
So unfortunately this isn’t it because getting the final state as the
outcome of this thing is like getting a fortune cookie that contains
the answer to all of life’s problems in it. It doesn’t really mean too
much until you extract information from it. So that final quantum
state that you get here it doesn’t solve any problems. What you have
to do is you have to measure that, which destroys the state and then
start the simulation protocol all over again to get more information.
And repeat this potentially a large number of times in order to learn
the information that you wanted about that quantum system.
Yes?
>> So concerning certification: since quantum computer makes errors as
well [inaudible].
>> Nathan Wiebe: You know if testing is available. So let’s say you
actually have the quantum system you are entirely right because of the
strong analogy between the quantum computer and the other system, yeah,
why not. If you have got the actual system the very best simulator of
that system is the system itself. If you want to learn something about
it just experiment on it. But, the problem is, the problem is, you
don’t often want to use simulation in order to understand physical
system. What you want to understand is you want to understand a
mathematical description of a physical system.
And that’s what a quantum computer can do. A quantum computer can say
that if you designed this simulation algorithm properly then the
mathematics that describes this time evolution should be simulated
accurately by the quantum system.
>> You can also ask questions you can’t do with a physical system.
>> Nathan Wiebe: Yes, and that’s the secondary benefit. There are
certain experiments that would at least be impractical to do in a
physical system.
So I will just do a very brief example of a spin system type problem.
To give you an idea about how the simplest variance of these quantum
simulations work. So this is a model knows as the Transverse Ising
model for 2 interacting spins. It’s a model used in condensed matter
physics to describe quantum magnetism. And the basic matrix you could
just imagine is a 4x4 matrix that has an interaction between the zed
components of these two quantum spins. And also it has an interaction
with an external magnetic field that’s pointing in the X direction.
So if you want to simulate that it boils down to the question of how do
you end up taking this quantum mechanical time evolution operator which
generates say this transformation into a series of gate operations on
the quantum computer? And well in this case I guess it’s a 4x4 matrix
so you probably could do it directly, but in general for higher
dimensional systems it’s hard to actually synthesize this in a straight
forward way.
So the way that this is done is using Trotter decomposition. So the
idea is that this Hamiltonian is the sum of 3 terms. Each of these it
turns out can be efficiently implemented on a quantum computer, but
together it’s not clear how you would do it. So what you do is you
say, “Okay we break up this time evolution into a series a very short
time steps and in each of these time steps you evolve only according to
one of these 3 terms”. And by increasing the number of time steps you
can make this approximation arbitrarily accurate.
So if you do that you can actually find a quantum circuit that’s
equivalent to these operations and this quantum circuit is down here.
But, unfortunately this isn’t done because there are actually two
halves of these sorts of simulation algorithms. The first half is what
I have described here. It’s this process of Trotterization and
breaking it into elementary rotations. The second half is converting,
or compiling these elementary rotations into fundamental gates that the
quantum computer could actually use.
There are a number of different gate sets that can be chosen, but
common choices are pi by 8 gates and Hadamard gates. There are many
methods that are known to synthesize these single qubit rotations that
appear in this sort of a circuit. And of course the groups at
Microsoft Research are world leaders on these sorts of techniques.
Now let’s talk about what the state of the art methods are at present
in quantum simulation. The most common approaches that are often used
are these product formula approaches, which are exactly the same sort
of thing that I showed you with this example of the spin system. It’s
just more sophisticated, they often will use higher order Trotter
formulas than the one that I presented. They are very good for high
accuracy simulations of extremely sparse Hamiltonians. More recently
methods based on quantum walks have been developed, largely by Dominic
Berry and Andrew Childs. And these methods are unfortunately not as
good for high accuracy simulations, but they are much better for nonsparse Hamiltonians.
The methods that I am going to be discussing in this are multi-product
based methods, which actually are superior to these product formula
approaches in almost every way. I am sure the group will be able to
tell me very quickly what the one way they are not superior is, but I
will leave that as a surprise.
So with product formulas the best known for the scaling of product
formulas is this: here M is the number of elementary Hamiltonians that
you have in your Hamiltonian. Again, the Hamiltonian is like your
quantum energy operator. So it scales like quadratically with the
number of elementary terms that you have in your Hamiltonian and it
scales nearly linearly with the norm of the Hamiltonian times the
evolution time.
>> [inaudible].
>> Nathan Wiebe: Yeah, it’s the maximum norm of all the little
[indiscernible], so M times that will clearly be an upper bounded norm
of the Hamiltonian.
>> Are each [inaudible], are they supposed to be in some certain way
not interacting with each other?
>> Nathan Wiebe: Oh no, they can interact with each other. But the
point is that each of these individual terms has to be individually
simulatable. Okay, so they don’t necessarily have to be non-commuting.
So the final thing over there is the error tolerance. And these
algorithms scale sub-polynomial with the error tolerance that you want
out of the simulation, but unfortunately not polylogarithm.
So that’s the basic intuition behind this. And in order to get this
sort of performance you can’t just use the basic charter formula that I
showed previously. You really have to choose higher and higher order
Trotter formulas as the evolution time and the error tolerance end up
becoming more stringent.
>> What’s the big O tilde?
>> Nathan Wiebe: Oh, big O tilde, what I have done is having dropped
terms that are logarithmic in here. Just to make the expression look
nicer.
>> [inaudible].
>> Nathan Wiebe: Yes, most [indiscernible] logs.
>> Why are you saying this is sub-polynomial?
>> Nathan Wiebe: The reason why is because obviously if I didn’t have
this square root here it would just be like E to the log one over
epsilon. So that would be, you know, log 1 over epsilon. But, when
you have the square root on here this actually makes is smaller than
any polynomial function. So that’s why this is sub-polynomial.
So the basic intuition for how you generate these high order Trotter
formulas is like this. Say you have your time evolution operator and
what you want to do is you want to write this as a product of
elementary time evolutions that you can actually carry out. And you
want to choose the times for these elementary evolutions so that you
reconstruct the Taylor series of the actual time evolution operator
within some pre-prescribed error.
This is actually a hard task to try and find all of these different
times that are necessary in order to actually reconstruct the Taylor
series. There is actually a cottage industry in numerical analysis for
finding different times in order to do this. However, there is
fortunately a very nice recursive scheme that Suzuki invented in order
to refine a low order approximation into a higher order approximation.
And this iterative scheme basically works as follows: you start with a
low order approximation and you do two times steps forward with that
low order approximation. Then you do one time step back for a certain
value of time. Then you do two more time steps forward. And with this
two forward, one back, two forward procedure, by choosing this single
parameter rather than all of these different parameters individually,
you can actually find a neat way of guaranteeing that you will increase
the error by two orders by doing this at the cost of increasing the
number of terms in your approximation by a factor of 5. So the trade
off is you have 5 times as many exponentials, but you get two more
orders of accuracy out of doing this.
>> [inaudible]
>> Nathan Wiebe: Sure.
>> Does T turn to 0 somewhere?
>> Nathan Wiebe: Yes it does. So here what I am doing is I am just
talking about analyzing one of the short-time slices for the time
evolution. If you are looking at a long-time evolution then imagine
just taking R slices of that and making each of them short and then you
use one of these high order approximates.
>> So the slice is tiny, but it also gets partitioned some more.
>> Nathan Wiebe: Yeah, it gets partitioned some more into these smaller
bits out here.
And so I would just like to give you guys a visual way of understanding
how these product formulas work because our approach in contrast is
going to do something very different. So here these boxes, what they
do, this represents a Taylor series. This first box is a 0th order term
in a Taylor series for the time evolution operator. This is the 1st
order, this is the 2nd order and then these two have errors in them.
And what the Trotter Suzuki Formula does is it combines them in an
appropriate way such that when you multiply these boxes together these
errors over here end up getting a negative sign because of the fact
that you are looking at a backwards in time evolution. The products
between all of these end up causing these terms to interfere with each
other and cancel out. And a symmetry consideration ends up causing
these high order terms to also cancel out when you put it in this form.
So that’s basically how this works.
One of the big drawbacks though is if you just take a look at what
happens with the errors whenever you multiply are that every time you
multiply new types of errors are created by multiplying terms that
previously worked with error terms. So for example let’s take a look
at the errors that the second error terms here. These can actually be
formed by multiplying this correct term, or error terms of that scale
can be formed by multiplying this correct term by that incorrect term.
And thus you will generate a more complex set of errors through
multiplication then you would otherwise.
So a lot of the effort you can imagine conceptually in the Trotter
Suzuki Formula is to actually counteract these errors that are
introduced by multiplication and deal with the fact that multiplying
polynomials isn’t a very natural way to build a Taylor series. The
natural way to build a Taylor series is to add them. And that’s
exactly what we do.
So we suggest doing something very different; don’t multiply. Start
with your lower order formulas. Come up with some weighted sum of
these lower order formulas and you add them together in an appropriate
way to make the Taylor series that you want. And this is very natural
because of course with Taylor’s theorem you just add the individual
terms in the Taylor series expansion to construct it anyways. So this
doesn’t create this problem of propagation of errors.
This sort of an approach has already been known in the numerical
analysis community for quite some time. The Richardson’s extrapolation
is the simplest example of these sorts of approximation methods. And
in general we can construct multi-product expansions that work by
adding together many low order product formulas with different
coefficients out here and construct our approximations that way.
That’s the method that we use here. So rather than these massive
product formula approximations we add together a bunch of them with
different with coefficients in order to approximate the time evolution
operator within some accuracy.
There are some advantages to doing this. The first key advantage is
the number of exponentials that you need to create the formula using a
Trotter Suzuki you will notice will grow exponentially with the order
and that’s exactly for the reason that I mentioned. Every time you
increase the order recursively you need 5 times more exponentials,
hence the 5 to the K minus 1. Whereas multi-product formulas you only
end up needing order K squared terms, which is fantastic. Of course
there is something that is very un-fantastic about this as well.
And that’s the fact that although the Trotter Suzuki Formula is
unitary, which means that an ordinary quantum transformation that can
easily be done on a quantum computer. These formulas in general are
not unitary. So you have to go through greater effort on a quantum
computer in order to try to synthesize them. However, we end up
discovering ways to do so and actually find surprisingly that you can
use non-unitary operations on a quantum computer to simulate unitary
dynamics more efficiently then you could by using the unitary dynamics
by itself.
So the way we do this is using the following circuit. This circuit is
designed in order to create a linear combination of two different
unitaries. So effectively sums of two unitaries with an arbitrary
coefficient kappa in front of it. And the way that you do this is you
take this single qubit quantum transformation here, you apply it to the
input bit and perform these controlled evolutions, where here U0 and U1
you could think of as just two different operator splitting formulas.
This one might take one time step and this one could take two time
steps, but they are the same formula. And if you measure this and
observe 0 then you will have actually performed the correct linear
combination that you want; whereas if you measure 1 then you won’t and
you might have to perform some error corrections.
This can also be generalized pretty straight forward to adding more
than two terms just by using a larger unitary on many qubits. And
that’s effectively what we do, but for the purposes of this
presentation I will focus on just adding two terms together because it
captures all the intuition that you need.
>> So I have a philosophical question.
>> Nathan Wiebe: Yeah?
>> The [indiscernible] here actually needs to be [indiscernible],
right? So how big is the [indiscernible] of information then and why
do you think the [indiscernible] information is sufficient?
>> Nathan Wiebe: The actual qubit remains in a pure state actually
during the entire evolution. The reason why is because when you take a
look at this linear combination of unitaries that you end up getting
over here this is actually just going to map a pure state to a pure
state. So it’s not like you are going to end up getting a mixed state
out of this linear combination.
For example, you know, let’s take the worst possible, most destructive
linear combination that I can imagine for these sorts of things.
Imagine you want to do a linear combination if identity in zed. What
you end up getting is you end up getting just a projector on to the 0
state from that. And that just ends up mapping you to a pure state.
So these sorts of combinations will always end up giving you a pure
state.
>> [inaudible].
>> Nathan Wiebe: Yeah, sure.
>> [inaudible].
>> Nathan Wiebe: One more back, okay, certainly.
>> So in cases that you deal with what is the probability of getting 0?
>> Nathan Wiebe: Okay. The cases of probability that I deal with
depends on the operation. And the reason why, I am going to get to
this in a second, but there are two sorts of errors that end up
happening. It’s depends if U1 is approximately U0 or approximately
negative U0. Those are the two cases that come up in the simulation
algorithm. And in one of the two cases it turns out that if an error
occurs it’s entirely easily correctable.
In the other case it’s catastrophic and you have to throw out
everything. So much of the cost of this algorithm is actually going to
boil down to making sure that the probabilities of these catastrophic
events occurring are vanishingly small. But, for the non-catastrophic
ones the probability of success is about .5 is what we found the
optimal to be.
>> [inaudible].
>> Nathan Wiebe: Okay. In general U0 and U1 are going to be two
different values of the same --. Let me go back for a second. This is
what it is. SX over here is an approximate for the time evolution
operators. So it’s going to be one of these product formulas. So in
general these are all going to be of the same form, but they are just
going to use a different number of steps. So one of them will be two
copies of the formula with half the time and the other one will be like
on copy of it with the full step. So they are going to be exactly the
same form of approximation, it’s just one of them will involve many
steps and one of them will involve maybe a few.
>> [inaudible].
>> Nathan Wiebe: For different values of cube here.
>> Different.
>> Nathan Wiebe: Yeah.
>> So maybe you said this already, but what do you get if you measure
1?
>> Nathan Wiebe: You know what, how about I go to the slide. All
right. So if I go to the slide, if I measure 0 then what I get is I
get the linear combination of the two unitaries that I want to actually
implement. If I measure 1 then I get the difference between those two.
So this is one of the reasons why I mentioned it depends on whether or
not the two are close to each other or very far from each other,
because if they are very close to each other then this term will be
approximately 0.
And if it’s approximately 0 it’s just going to clobber your state and
your get some garbage left over; whereas if they are opposite to each
other then you will have approximately 2UB which is a unitary
operation. And furthermore it’s a simple unitary operation that you
have got an operator splitting formula for so you can actually invert
that. And so that’s the basic intuition for this. So for some of the
steps, some of them will be good.
>> But typical probability is about 50 percent.
>> Nathan Wiebe: Typical probability is about 50 percent, yeah, that’s
the best to shoot for.
>> [inaudible].
>> Nathan Wiebe: What do you mean by that?
>> You are removing the first qubit.
>> Nathan Wiebe: You are just removing the first qubit.
>> By measuring it.
>> Nathan Wiebe: The second is where all of the information that you
want is actually encoded. This is just used in order to allow you to
do this trick where you perform a weighted sum of these two.
>> So would you do this as a [indiscernible]?
>> Nathan Wiebe: Perhaps, it’s certainly along the same sort of theme.
>> So you make this unitary transformation just [indiscernible],
discard it, but perhaps one thing to do then instead of trying to apply
the unitary transformation to your actual states you make an
[indiscernible]. If it succeeds then you [indiscernible].
>> You try to recover from the other one?
>> Nathan Wiebe: Yeah, it’s one of the things that I remember we looked
at. We weren’t able to get any traction on that particular idea to use
some sort of a gate teleportation type idea to do this. That would
clearly be ideal, but I will be happy to talk about it later.
>> The problem is that these are not [indiscernible].
>> Nathan Wiebe: Yeah, exactly. All right. So that’s basically it.
As I said before, but now it’s a belabored point, some of these are bad
errors and some of these are not so bad errors.
So just to give you a basic overview of how this would work for the
simplest possible case imagine what you have is you are just using this
Richardson extrapolation formula here. In the simulation you use this
particular formula and here S1 could represent the simplest Trotter
formula you can use, the symmetric Trotter formula. So you use that
formula with the weight of 4 here and a weight of minus 1 there and you
use those circuits in order to implement this particular linear
combination. Again, in general what will happen with this is that the
error correction can be done with high probability in this case because
S1 has opposite sign to that S1.
So essentially how it works is as follows: the flow chart is you
attempt a time step with one of those terms in here. You have two
options, either succeeds or fails. If it succeeds you go onto the next
time step. If it fails, well then you attempt error correction. And
the error correction that you attempt you can do using this exact same
method. It turns out there is a chance of a
you attempt that error correction step. But
made arbitrarily small. So you attempt that
abort the simulation, otherwise you try that
repeat this process until you are done.
catastrophic error when
that probability can be
and again if it fails you
time step again. You
>> So you are doing this at every time step.
>> Nathan Wiebe: You are doing this at every time step.
>> And you could have, you know, a million time steps.
>> Nathan Wiebe: You could have a million.
>> And if one of them has a chance of failure abort?
>> Nathan Wiebe: Yeah, the second last time step could be the one that
fails.
Nonetheless when you consider all of these possible things that can go
wrong and all of the additional costs involved in making these errors
small you notice that what ends up happening is we end up getting an
exponent up here which is smaller then the exponent that you end up
getting for Trotter Suzuki. Just because even despite all of this,
these problems with error correction and the like, the advantages of
these multi-product formulas are so great that you do end up getting a
benefit out of it. This polynomially ends up reducing the scaling with
the error tolerance over the best known results using just pure Trotter
formulas.
>> That includes all the abort properties also?
>> Nathan Wiebe: Yes. And so basically what this means is we have
given a method that is superior to product formula simulation methods
in nearly every way. Furthermore this could lead to improved quantum
chemistry algorithms and also, most importantly I think, it gives a new
way of thinking about simulation that doesn’t directly involve a
logical mapping between the initial system and the final system. The
dynamics effectively that are going on in the quantum computer are
actually not the same sort of dynamics that are going on in the
physical system, because of the non-unitary. And it says that this
paradigm has some advantages over the traditional paradigm.
But the real question is, “Well is quantum simulation a killer
application”? And I would say that for scientific computation arguably
you can say yes, however for general purpose computation probably not.
Really the golden standard of this talk is VisiCalc. And quantum
computation is not something that is going to get people who are
interested directly --. Sorry, quantum simulation is not going to get
people who are interested in say data analysis to directly get out and
be excited about quantum computation.
So I don’t think it’s quite as compelling for general purpose
computation. But, undoubtedly it could be incredibly useful for
scientific computation. However, the next think you could think of is
maybe actually because you have such a compelling application for
scientific computation maybe it’s possible that some problems within
the scientific computation community or simulation problems can be
mapped to other classes of problems that are more relevant. And this
is exactly the same sort of intuition that’s used in the Harrow
Hassidim and Lloyd algorithm for solving linear systems of equations.
So this leads to the second part of my talk which talks about linear
systems and data fitting. So the linear systems algorithm really
actually is built out of three components. And the first component is
quantum simulation. The second uses essentially ideas from phase
estimation and amplitude amplification in order to make everything
work. And basically the way that this problem works is imagine you
have some known matrix A multiplied by some unknown vector gives some
known data. And what you want to do is you want to find the unknown
vector. So this is a standard matrix inversion problem that you want
to solve, something that people do all the time.
So obviously if this actually provided an exponential speedup that has
no caveats this would certainly be a killer application. But, there
are some pretty big caveats that we will get to in a second. The way
that you approach using quantum computers in order to solve this
problem is you begin with the basic problem and then you quantize it.
You replace the input vector by a quantum state. And again this
quantum state, because of normalization will only be proportional to
that input state. You can lose constants of proportionality with the
way that they designed this.
Then what you do is you invert A by diagonalization. And the way that
diagonalization works is basically it uses phase estimation to store
the eigenvalues values of e to the minus iAt in a register. Then you
divide by the eigenvalues values in order to implement e inverse. And
that’s how the quantum algorithm essentially does its job.
And the key point though is that at the end of the algorithm what you
end up with is you end up with a quantum state that encodes the answer
to your problem; not the answer. And that’s a big drawback, because in
general the size of the problem is exponential. So if you wanted to
read the output out of this you would have to sample the quantum state
an exponentially large number of times. And you totally lose any
advantage that you could possible get from this algorithm by doing so.
So that’s one of the major drawbacks of this particular algorithm.
However, you can, if you are interested in this and I don’t know why
you would be, but what you can get out of this is you can figure out
expectation values of your solution to the system of equations. There
may be some application where this is useful, but I don’t know what it
is right now. And furthermore this work can be actually applied to
solving systems of differential equations, but again you don’t get the
solutions there you can only find the expectation values or other
similar properties to systems of differential equations, which although
does have some value there it doesn’t give you quite everything that
you would want.
>> But on case why is a sequence of 0s and 1s.
compute [inaudible]?
>> Nathan Wiebe: Sorry?
So then when you
>> So if lambda --.
>> Nathan Wiebe: I think, let’s go back here.
>> It’s a binary problem don’t you get [inaudible]?
>> Right, the binary case.
>> Nathan Wiebe: Okay, so if you get --. You will always get a binary
representation, but the way that actually this is encoded generally
speaking it’s going to be all these real values for each of these
components will be encoded as amplitudes of the particular values of
this. So for example the first entry in that would be encoded as the
amplitude --. Yeah?
>> [inaudible].
>> Nathan Wiebe: Right, but the amplitudes will be 0 and 1 in that
case. And in that case you may be able to take advantage of the
peculiarities of that particular constraint to learn more information
than you would otherwise, but certainly the general problem is going to
be hard. Maybe with this particular problem that you mentioned there
could be a cunning way to get around it, but I don’t know what that is
for the moment.
So there are two basic problems I guess for this. I mentioned the
output problem of reading the output, but also the input could be
exponentially hard to generate. So you have this really odd situation
in a way with this algorithm, right. The actual hard part on a classic
computer now becomes trivial, but generating the input and reading the
output becomes extremely hard.
>> Well I guess to be fair it would be extremely hard on a classical
computer.
>> Nathan Wiebe: It would be extremely hard on the classical computer
as well, right.
>> But once you [inaudible] it once it doesn’t get destroyed the first
time you look at it. So it’s exponentially reloading to get the
accuracy that you want [inaudible]. That’s the real problem there.
>> Nathan Wiebe: So clearly this isn’t a killer application at all.
The question is, I mean is there something that you can do with this?
Can you generalize these ideas to come up with a different algorithm
that doesn’t have these same drawbacks? And this is something that
Daniel B and Seth Lloyd and
I thought about when we were approaching
this. And the application that we thought of was quantum data fitting.
And the basic idea behind the problem is as follows.
Oh, sorry, data fitting to me is such a ubiquitous task in general
purpose computing that if we did get a quantum speedup for this then
certainly this would satisfy the goal of this talk. It would be a
killer application for quantum computation. And so the question is,
“Well, does it work out”? Well let’s discuss the data fitting problem
and how that actually works.
Imagine you have got some function a Y of X and what you want to do is
you want to represent this as some combination of fit functions, FJ of
X. Your goal is to find weights for each of these fit functions that
minimize the square error. So that’s the idea and there are many ways
that you can do this. Conjugate gradient methods are a great way of
doing this on a classical computer, but you can also use a linear
algebra approach where you just apply an operator known as the MoorePenrose Pseudoinverse and that operator is given down here.
The key advantage of our method, even though this is actually more
complicated than the inversion problem is that this matrix F here no
longer necessarily has to be square. So for example you could try to
fit and exponentially large data set to a line, right. In that case
your output dimension would be you would have two parameters to fit a
line. So in that case you could actually solve the output problem.
You could efficiently read all the information that you need for sure
in this one particular application.
The input problem still remains unfortunately, but the output problem
could be resolved by this application. All you have to do is apply
this Moore-Penrose Pseudoinverse. I should also mention that this
pseudoinverse operation, in the case where the matrix is actually
invertible reduces to the previous problem considered by Harrow
Hassidim and Lloyd.
So the way we do this is we follow the exact same sort of strategy that
Harrow Hassidim and Lloyd employed. You start with your vectors and
you encode them as quantum states with each of the coefficients stored
as amplitudes of that particular value, or I should say entries. Then
what we do is we use a trick because in order to leverage quantum
simulation we need to have a Hermitian matrix. So we need to have
something that is self adjoined. And that generally won’t happen for
these fitting problems so we use a dilation of the space. We introduce
an additional qubit in order to make the matrices Hermitian in a larger
dimension.
And after using this trick it turns out that this F dagger F inverse is
actually just F inverse squared. So after using this trick this ends
up becoming that. And so this can actually be executed by using the
Harrow Hassidim and Lloyd ideas three times.
>> Did you say a qubit?
Is that one qubit?
>> Nathan Wiebe: Yeah, yeah, you actually only need one in order to
extend it out.
The idea is that as follows: you generate the initial data, then you
use Harrow Hassidim and Lloyd to implement F dagger. I should say you
don’t quite use it because you don’t need to divide by the energies in
that case. In that case you multiply by the energies, but everything
else is exactly the same. Then you use the algorithm two more times in
order to implement these F inverses and that’s what you do.
The cost of doing so is as follows: the key point to take home is that
it’s efficient, there is nothing that is exponential in this problem, S
is the sparseness of the matrices in question, kappa is the condition
number and epsilon is the error tolerance. So that is, oh and capitol
N is the dimension of the largest dimension of the matrix F.
So you can do a number of things that are actually kind of interesting
with this. So, yes?
>> How are F dagger and F inverse related?
>> Nathan Wiebe: Ah, how are F dagger and F inverse related?
they are not directly.
Okay,
>> [inaudible].
>> Nathan Wiebe: You are thinking unitary. If they were unitary then F
dagger and F inverse would be the same, but in this case it’s not
square. So this is just the conjugate transpose of a rectangular
matrix.
>> No I am just, because normally this whole thing just collapses down
to then an F dagger cubed.
>> Nathan Wiebe: Yeah, yeah.
>> Will that work in the square case? Does it help you at all?
because the whole point is the Moore-Penrose Pseudoinverse?
No,
>> Nathan Wiebe: Yeah, exactly.
>> Okay.
>> Nathan Wiebe: The whole point was to use the Moore-Penrose
Pseudoinverse in order to write this as a series of matrix
multiplications that you can then carry out on a quantum computer.
So that’s the basic idea. And there are actually a few things that you
can do with this, although the output state in general can be
exponentially large and it could be for the fitting problem hard for
you to actually learn to fit. It turns out that you can actually learn
the quality of the fit efficiently regardless of the dimension, which
may come as a surprise. Also, of course, if there are a small number
of fit functions that are being used you can directly learn to fit.
>> Don’t you still need an oracle [indiscernible] for fit functions?
>> Nathan Wiebe: Okay, so several things. I have not mentioned two
oracles. So there are two oracles, well actually no, effectively there
is one oracle, sorry. The oracle will provide you all the matrix
elements you need for the fit functions in a particular basis or in
fact in general you can imagine it in several basis if you have
different natural basis for each of the fit functions that are used
there. It is also useful to imagine that the input state is generated
by an oracle.
>> So are the oracle costs in this [inaudible]?
>> Nathan Wiebe: Yes, actually these costs that I mention are
effectively the query complexities.
So the way that you would go about and learn the quality of the fit is
really straight forward. What you do is you will note that F lambda
gives you the approximation to the data set that you have and Y is the
precise data set you have. So if you want to compare the two just use
a swap test. The swap test is a quantum mechanical test that allows
you to efficiently determine the difference between two quantum states.
So you repeat this process some number of times and then you end up
finding the quality of this fit by comparing.
So one of the things that’s actually kind of interesting is actually
the cost of this algorithm is less than the algorithm I gave
previously. The reason why is because the F here clobbers one of the F
inverses that was used in order to construct lambda. So actually
making F lambda is cheaper than making the previous state.
>> [inaudible].
>> Nathan Wiebe: Ah, very good question. So there is only one step
using our approach that it actually pays to do amplitude amplification.
Let’s go back and this is where the [indiscernible] comes from. This
is the only step that we use amplitude amplification in. These two
steps over here, it turns out if you try to use amplitude amplification
after that you have to reflect around the evolved state and that’s just
too expensive. The cheapest thing to do is just use amplitude
amplification here and not use it on those two.
>> So if it’s not squared what is F [inaudible]?
>> Nathan Wiebe: Excellent question. So F here, sorry, what I should
be saying here is that these are not the rectangular matrices in the
original problem. These are the Hermitian extensions of the original
matrices. So from this perspective once you have dilated it to a
Hermitian matrix F inverse of that dilation makes sense and it is
square.
>> But F dagger is not F inverse?
>> Nathan Wiebe: F dagger is not F inverse.
>> [inaudible].
>> Nathan Wiebe: Okay. The next thing is if you also want to learn the
value of the fit functions one of the things that you can do is you can
just compresses sensing as a tomographic technique in order to learn
what the amplitudes are for the output states. And that cost ends up
coming in as this additive term here, where M prime is the number of
fit functions that you actually use and epsilon is the error tolerance.
Another thing that I should mention that’s really kind of cool that you
can do with this is that you can actually find, if you don’t know a
priori a good set of fit functions actually by measuring the output of
the algorithm, because the way that everything is encoded it encodes
the answers as amplitudes. So the functions that have the highest
amplitude are the ones that are most significant to the fit and the
ones that you are most likely to measure when you measure the state in
the end.
So what you can do actually with these algorithms is something really
kind of cool. You can say, “All right, I don’t have any idea what a
good set of fit functions would be to represent this data. I will
start with a complete or near complete set of fit functions”. Then you
measure the output state and you sample from that. You find the ones
that appear most frequently and cut out all the rest of them and that
will give you, in some cases, a good guess for a set of fit functions
that you should use for the problem, even if you can’t a priori know
which fit functions are best.
>> But I still have the classical cost of doing the tomographic work.
>> Nathan Wiebe: The thing is you only have to do actual tomography
later, right. Once you have found the fit functions that you want to
use in the final tomography then you do the tomography. So this
process over here you can think of as like a compression process. You
start with a set of fit functions and you use this to throw out the
ones that aren’t useful.
So just to compare to Harrow Hassidim and Lloyd preparing the initial
state is inefficient with Harrow Hassidim and Lloyd’s approach.
Unfortunately, with quantum data fitting the same thing in general is
true. Learning the output isn’t necessarily a problem with our data
fitting algorithm in many cases and furthermore we can actually
estimate the fit quality efficiently. So this gives you actually
something that’s useful that you can do directly from this particular
application.
Also, going back to something that I mentioned previously a very
natural application of this is parameterizing quantum states, because
if you have a quantum state that’s yielded to you by an algorithm or
some other device then you have already solved the input problem. You
can just use this as an alternative to tomography and get an
approximate reconstruction of the quantum state that way. And yes,
unfortunately it also strongly depends on the sparsity of the matrix in
the basis that you choose for the problem. So you really do have to be
cleaver about the way you choose it or use good simulation methods.
So is data fitting a killer application? Well I honestly have to say I
think no. The reason why is because of the fact that it still has many
of these problems that the Harrow Hassidim and Lloyd algorithm had. In
order for it to be a killer application what I would love to be able to
do is to get some random data set that somebody has and just feed this
in and get it process that random data set incredibly quickly. The
problem is if you have to generate that data set via look up table you
are not going to be able to pair that initial state efficiently.
So what means is that means that this can’t be used in order to solve
many general purpose problems that people are interested in, but it
comes agonizingly close. Maybe there are particular problems that
people are interested in making, a cryptographic setting or something
else, where the initial state actually can be efficiently prepared.
And I think that taking a look at studies along these lines may
actually end up leading to the first true killer application for
quantum computation.
Thank you very much.
[clapping]
>> Krysta Svore: Thank’s Nathan, are there any questions?
>> Let’s think of a time when all the big data in the world will be
stored as quantum [indiscernible].
>> Nathan Wiebe: Yeah, it’s already at a point that when your resources
are already quantum then this, I agree with your remark.
>> So do you have examples of problems where the state [indiscernible]?
>> Nathan Wiebe: Well, yeah, actually I do have problems. This of
course betrays my background, but imagine what you have is you have an
un-trustworthy quantum simulator that produces some particular output
data. And you would like to learn what that data is. Well then you
can use that device to generate your input state, run it through this
algorithm and then fit it and learn the fit quality efficiently.
However, I think that there still is more work that probably should be
done. One of the things that we haven’t done is we haven’t looked at
particular sets of fit functions that can be or cannot be generated
efficiently. And that would probably be sort of the next step towards
making this much more practical by identifying some problems that are
concretely useful for people to solve and actually looking at the cost
of implementing the oracles needed in order to perform this.
>> [inaudible]. It was pretty obvious that it was stepping in
[indiscernible]. It’s a step towards universal computing. People like
Bill Gates who went and try to build a personal computer of course
thought about various applications it was obvious to them that
[inaudible]. Now here we are still in the stage that even if these are
two killer applications, both of them are still specialized devices
which do few special tasks far removed from universal computers.
>> Nathan Wiebe: So your questions is --.
your question.
Sorry I am trying to parse
>> He is trying to figure out if Visicalc is a reasonable thing to be
comparing against for the general case?
>> Nathan Wiebe: Well I think it is a reasonable, the spirit of
Visicalc certainly is a very reasonable thing to compare against in
that really ultimately what I would like out of a general purpose
killer application for quantum computation is something really gets a
bunch of people who already have substantial processing needs to be
able to say, “Wow, I really could use a quantum computer should it ever
come out in order to handle my daily problems”. And I think for many
of these people who aren’t already in the scientific community one of
theses applications haven’t come out yet.
>> So I would say view it as a cloud service, no one is going to have a
machine in their office. But could you sell a service that did their
data analysis much faster and much better for a general case which is
the movement [inaudible]?
>> [inaudible].
>> Well I don’t know how specialized it has to be if you actually solve
the killer application problem. That’s the point, it should be general
[indiscernible]. So are you familiar with [indiscernible] on
[indiscernible].
>> Nathan Wiebe: No I am not.
>> Okay, the late 90s. It’s extremely efficient preparation of lots of
data into states into a circuit model and it might be worth looking at
for some of the preparation models. The other is [indiscernible].
>> Nathan Wiebe: I am very familiar with that.
>> Okay, as I figured.
>> Nathan Wiebe: So that work in particular I feel that some of the
ideas can be used in order to speed this up. In particular there
ability to avoid using amplitude amplification and measurements in the
early parts of the procedure ought to be able to be useful for reducing
this dependence on the condition number in particular. I suspect that
we might be able to reduce this as low as kappa to the third by
adapting their techniques. That’s actually ongoing work right now that
we are looking at to optimize these.
>> And have you thought at all about the fitting questions, the
[inaudible]?
>> Nathan Wiebe: Well again, it really depends on the fitting functions
and the basis in which you want to do this in. I haven’t thought about
it in great detail. I know that this is actually a very real and
serious problem, but I think to some extent you need to have an idea of
what you would like to use this algorithm for in the first place in
order to do that. So that’s what we are doing right now. We are
trying to think about what the actual best use cases are going to be
for our algorithm and then attack those.
However, natural candidates for these sorts of fit functions would be
for example bounded polynomial functions that only end up having
support over small areas or trigonometric functions. Those are the
sorts of things that would certainly be very easy to do either with a
quantum fourier transform or directly using a sparse representation.
>> I think that would, I am just agreeing with you that this would be a
reasonable approach because the machine learning people only pick the
fitting functions they use because they are easy to compute and are
easy to separate, which are the same things. You find things that are
quantumly easy to do. It doesn’t matter what the actual function is as
much as the attributes.
>> Nathan Wiebe: Right.
>> Krysta Svore: Are there any other questions?
Nathan again.
[clapping]
If not let’s thank
Download