>> Krysta Svore: So hi, everyone. Thanks for... talk. Today, we have Martin Roetteler here from NEC. ...

advertisement
>> Krysta Svore: So hi, everyone. Thanks for coming to today's
talk. Today, we have Martin Roetteler here from NEC. He
received his Ph.D. at the University of Karlsruhe in 2001 in
Germany. After that, he held a post doc position at the
Institute For Quantum Computing at the University of Waterloo in
Canada. And now, he's a senior research staff member at NEC,
which is in Princeton. And he leads the quantum IT group there
at NEC. He's published more than 90 refereed journal and
conference papers and is co author of a book on quantum
information. His research interests are quantum algorithms and
quantum error correction. Today, we'll hear about quantum
algorithms. Martin will tell us about quantum rejection
sampling. So let's welcome Martin.
>> Martin Roetteler: Thanks, Krysta. Thanks a lot for the long,
elaborate introduction. Very nice. I want to talk about
algorithms. In particular, about a problem of sampling. So you
all know classical sampling where you have prorated distribution,
you want to sample from them. In quantum, we have a problem that
arises in some specific situations where we would like to do some
sort of quantum sampling so we want to prepare certain quantum
states into the memory of the computer and maybe then,
afterwards, sample from them or massage them into something else.
So that talk will be dealing with that problem. Technically, in
some part.
I want to start taking you on a little detour, though. I want to
start with some news coverage you see in the media about certain
company, maybe get my take on it. Maybe the tyke is still not
fully developed yet. But I want to take you on a little tour and
then arrive at a problem that I'm very passionate about. It's
called the hidden shift problem. And I'm passionate about it
because it relates to a feature of a quantum computer that I
personally thing is maybe the most powerful thing. It's the
ability to do fully transform. It's very, very fast. And that
thing I want to show you today something that maybe you've seen
before, maybe not. It's that you can use the Fourier transform
in a way that it helps compute convolutions.
Classically, that's a very, so important feature of the Fourier
transforms. Quantumly, I think that has not fully leveraged yet,
that's feature, and I want to take you on that trip too, and I
want to show you how that can be done.
Let me start first, all the hype, you see a lot of articles these
days about quantum computers, are they around yet. Some machines
are called quantum computer where it's maybe not clear yet
whether it's indeed quantum. And Google apparently has recently
purchased one. Lockheed a few years ago purchased one. So I
have a few slides at the very end of the talk where I give my
personal take on it. But like I don't know. I personally don't
know. Is it quantum or not, I don't know. I'm very curious
about it. And for the last three or four years, I had to deal
with that situation because our management really wanted to know
what to do with these machines. So I had to form an opinion
about them, and I can share you what my opinion is. And in a
nutshell, I'm curious. I would like to find problems that are
very good for these machines and where we could leverage, or I
would like to find more evidence that shows that these machines
do not work. In particularly, if you scale too many spins, there
are maybe reasons coming from statistical physics that tell us
that these machines are getting
maybe that they will break in
some way.
So I want to give you a little bit of a flavor, even though I'm
not a physicist. I'm not like
I'm a computer scientist. I
can only, like, repeat some of the work we did in our group
regarding that specific analysis.
Okay. Coming back to the questions, is it hyped too much, or is
there a breakthrough coming. So I want to talk about the hidden
shifts, which is a problem convolutions, right, how can we use
the Fourier transform to form convolutions, and then get into
rejection sampling, which is the idea to massage a state into a
target state, roughly speaking. We analyze that using a semi
definite programming for very specific problem. And then get
into some other algorithmic areas.
Oh, by the way, feel free to ask throughout the talk. I like it
much better if people ask right away. So any questions up to
that point? Okay. So why do we do quantum computing at all?
Well, the ultimate price would be to find problems where we can
show exponential speedups. That's the promise of a quantum
computer, to find problems where we can beat a classical machine,
like the best known classical algorithm where we can beat that
potentially. That's, I think, the holy grail of that field, and,
of course, there's been a lot of work been done where people show
polynomial speedup, maybe a square root improvement or a two
third improvement or so.
These things are also very interesting, but in a way they are
one has to look much, much more careful on a concrete instance if
you get a speedup or not. Like the constants really matter and
I've just, like, I just actually finished this weekend my
involvement in an iApple project so I survived the thing where we
had to look at massive sets of instances and perform baseline
estimation. What is the actual cost, given like a hardware
model, what is the actual cost of the algorithm. It turned out
that the polynomial speedups don't help much at all in these
cases, because there are so many overheads due to error
correction, for instance, you have to pay make three, four, five
orders of magnitude overhead for the error correction.
But there are other overheads we also have to pay, like to make
things reversible, you have to pay an overhead that usually leads
to very big additive terms in our estimates, and they're so big
that in order to show an improvement, you have to go very far out
in the instance size. If you're interested, I can talk much more
about that too. In a nutshell, we try to leverage interference
in an algorithmic way. So we have kind of computational path,
and we want to bring them to interference to cancel them out.
That's something you cannot do with a classical, probabilistic
computer. Otherwise, a quantum computer is a kind of
probabilistic machine, but it has an extra ability of cancelling
passes that you don't want. You can bring them to destructive
interference.
And so right now, it's known that they are good for some problems
that fall maybe within cryptography, maybe fair enough to say the
problems where it's really, really good, they are like breaking
SA and elliptic curve systems and such things. But presumably
for a real business case, one would want to look at different
problems, because how many of these could you sell if that's the
target application? It's not so clear, right? So if the quantum
computer, however, is used for simulation of quantum systems,
arguably, there's a bigger market for that, right? More people
would purchase a machine specifically to simulate quantum
systems. Then they the number of people who would purchase a
machine for breaking RSA, like my personal belief is if you
indeed get to that state where you have a large scale computer
people would probably move to some other schemes for probably
keep cryptography. Even if they are, they would need larger keys
and whatnot. So that's actually another thing that's worth
studying, like can we tackle them too.
But I think like the real market value, some day, will sit here
in simulations. NEC's quantum computer is based on super
conducting qubits and it's quite small. This is like an older
layout where the qubits were very strongly coupled, were very
close, and there was a circuit that performed a coupling.
Nowadays, there are other ways to lay out a circuit. So Santa
Barbara has a group that's top in that field, and Yale and IBM,
for instance. In Europe, there are very groups. So the current
trend there is not to do these kind of very close geometric
couplings anymore, because they're kind of coupling them too
strong. So nowadays, they are like so called resonators between
them. They're like little cavities so you can put them very far
away, and still be able to couple them. And that has many
advantages.
For instance, you can read out the individual qubits much better
than in this layout. So this one was good to demonstrate to
queue bit algorithms. So in a lab, two qubit algorithms nowadays
can be done in a super conducting qubit in the circuit model.
And as far as I heard from people who were at the APS meeting
this year ago the trend is now to scale them up. So people are
very careful scaling it up, but there are groups who can do
three. There might be groups who can do four, and there might be
groups who can do up to like five, six or so in the near future.
So this will definitely scale up. Not this particular layout,
but the one with resonators. I think it will scale up quickly.
We're not there yet where error correction really would play a
big role. But arguably within a few years, there would be a time
where the systems are getting into the hundreds and thousands of
qubits and then error correction would be important. Any
questions so far? That's just to motivate what we're doing.
Okay. So I'm sure many have heard, like, for factoring, there is
an efficient quantum algorithm. The best known classical
algorithm is the number field shift, the generalized number field
shift, to be precise. It has a run time that's exponential in
the bits, the number of bits of the number you want to factor.
And the best known quantum algorithm has a polynomial running
time. So there's a big gap there. Of course, you might ask,
like, how do I actually implement this, and that's a big topic.
But it would break a lot of the public key crypto that's
currently used to set up, to initiate secure communications.
Then people usually switch to a symmetric side once it's
initialized. But in the initial stage, you have some asymmetric
method. So that will be broken. And some of the things I worked
on were generalizations of this idea. So it's, in essence, it's
abelian problems. Things compute. You have a group which in
that case is just a large cyclic group, and you have some
property of a subgroup of that big cycling group. In that case,
you want to find a period so you just specify a subgroup of that
group which you want to find. And then the quantum algorithm can
find a generator of that subgroup that, in turn, can be used to
factor.
So this idea itself can be abstracted out, and one arrives at a
problem called a hidden subgroup problem, and that's kind of
you can apply that suddenly to many groups. You're not
restricted any more to cycling groups or abelian groups. You can
apply it to any group, any finite group. You can even apply to
infinite groups. You could apply it to the SU2 group and so on.
People have thought about all these things.
And why did people do that? Well, there would be some instance
which would be very nice. If you could solve that hidden
subgroup problem, completely generally, there would be beautiful
things we could suddenly do. We could do
actually, I have a
slide about that later in the deck. We could do graph
[indiscernible] morphism. You could decide whether to give them
graph size morphic or not. We could do lattice problems.
Certain lattice problems we could answer, like finding the
shortest vector in a lattice of some type. There's always some
little asterisk saying a little fine print saying it doesn't work
for all lattices, right. The lattices have to satisfy some
condition.
But we could certainly tackle some of these SVP problems too if
we could do it generically.
>>:
[indiscernible].
>> Martin Roetteler: That's a good question. So the question is
how do we actually specify the subgroup in the general setting.
It's always
it always has to be given by
it has always to
be given implicitly by a function that you can evaluate as a
circuit, for instance. The function takes as an input a group
element of a group. So there's a parent group, an aim incumbent
group. It needs to take as an input that group element and
output something. That output domain is not really important.
That range is not really important. But it must have the
property that you can evaluate the function. And then you must
have the promise that the function takes the same value on the
subgroup.
If it would fluctuate in terms of value, it would immediately
break.
>>:
Characteristics function
.
>> Martin Roetteler: It's a characteristic function, but it
needs to be more. If it's just
if it's just some board I can
write on if I have just a characteristic function, it's not going
to be enough. That's a very good point, actually.
Suppose that's my ambient group, right? It's a big domain.
Suppose I encode the elements as bit strings of order N, of
length N, right, some kind of encoding. And then these guys
would be implicitly defined by a function. But if it were just a
characteristic function, so I get constant one here and say
constant zero out here, this approach would not work, actually.
Because what would happen is what we do is we take Fourier
transforms of that function, sort of. And if it would take a
characteristic function, think of it like this. You have your
group G, you your H here, and you got your little bump telling
you, right. But you've got all these zeroes out here. Now
imagine that's encoded in a phase somewhere, right? A phase
encoded that would mean maybe I get a phase here of minus one and
a get a phase of plus one out here.
If I take a Fourier transform of this, I get a huge peek at the
zero frequency. That's not good. I would
and a Fourier
transform really, to be quite honest, is the only way we can
tackle these problems. We go to a Fourier domain for this, which
is not the standard Fourier transform. It's like a Fourier
transform for that group G. I can also talk about that a little
bit later. But as like a way to diagonalize the group action in
a basis so that the action becomes very low call suddenly, right.
Technically speaking, there are like irreducible representations
of that group. But you might think of them as being frequencies.
So you would get, from the frequency pictures, you would get a
huge peak out here and then maybe a little bit of signal that
tells you what is actually that H. But if it's just that, it
would not. It would not work.
>>:
People can see
>> Martin Roetteler: I'm sorry, okay. I was just going to
indicate down here that the spectrum of that guy would have a
huge peak here and then maybe a little bit of signal outside. It
would encode what that H is, but if you just measure from here,
you would typically fall in the big peak and not sample anything.
So that's why a second ingredient is necessary, and that's on
different cosets of the group, so if you shift it slightly, you
want different values of the function. You don't want just have
that. You want to have maybe a phase here, a phase here, and so
on.
And that turns out to be enough for abelian groups. So if you
have ambient group G is commutative and you have that
partitioning here into all these characteristic functions of all
the cosets, and they're all different on different cosets, then
actually you can, by magic reconstruct this age. So that's why
it's a little bit restrictive.
Essentially, it's the idea of taking a characteristic function
that's right. The question is what are the real world problems
we can map to it. Some kind of lattice problems can be mapped to
it. Graph isomorphism can be mapped to it. And there's some
more. Sometimes it's kind of contrived, but there is definitely
more work to be done on that particular area.
>>:
[inaudible].
>> Martin Roetteler: It would be ver non abelian. For
isomorphisms, it would be actually be
if you take isomorphisms
of rigid graphs, where the graphs themselves have no self
isomorphisms, you would get a [indiscernible] product of N to the
S2 so you get two copies of SN. You're allows to permute the
first graph, the second graph. But you also allowed to swap the
two graphs. So you get
this would be the group.
It's kind of nice. Like the representations would be very easy
to write down. It's just pairs of representations. Two types.
One type is given by pair of representations of SN, and then you
can induce to it that group. There's a process called induction
that gives you. Or there's another type where actually you have
the same
if you have the same representation of the two
factors, then it can be shown that you can actually extend it to
G. So there's kind of two cases where you extend it or you can
induce it if they're different, say different shapes. Then you
can induce it to G. So it's completely understood what the
representations are. It's completely understood what the Fourier
transform are.
What's not understood is what sense can we make of that non
abelian Fourier spectrum. If that's the group, if the group is
this nasty thing, we suddenly don't have scalars in the spectrum
anymore. We have matrices in the spectrum because the spectrum
looks like some block diagonal list of matrices. So those are
now the
what used to be a Fourier coefficient is now a matrix.
And the question is if we zoom into these matrices, which is what
we can do physically, we can actually assume that we are given
one of these matrices, we can prepare it. The question is what
do we then learn about the hidden subgroup from just having that
matrix.
And that actually is very frustrating for that group in
particular. There's not
there does not seem to be much
information in here in just a single
actually, we can show
that if you just follow that standard recipe of preparing
supposition, sampling from here, generating these states, getting
exactly these matrices, we can show there is not enough
information here. When I tweak it, one might for instance say
let's take many copies of this and then do joint processing.
There might be enough information there, but nobody knows how to
extract it.
So that's something I actually would like to work on in the
future too, like can we come up with ways to kind of make
probably can't see that, make many copies of a state like this,
zoom into the blocks and then do a joint processing. But not
many techniques are known in general to do joint processing on
many registries at the same time. Like one is the PGM, so called
PGM, but that doesn't seem to help here.
All right. Yeah, more questions? So actually, some of these
negative results have led to some initial results in post quantum
crypto. What that means is let's find schemes that are still
secure, even if a quantum computer is around. That seems like it
may be premature to ask that question, but definitely there are
people interested in that. People at NIST, for instance, they
might be interested in that.
If a quantum computer is around, what recommendations can we
actually make, what systems to use, right? What is a good public
key system and what are reasonable security parameters taking
into account these quantum attacks. That's actually a question
people might be interested. And there's only very little is
known right now in terms of what alternatives there are against
quantum attacks, but I think it's an emerging field.
Another emerging thing, and this is slightly different in terms
of the what actually needs to be done is actually attacking
ciphers. So you might ask, okay, now I have maybe AS, right. I
can observe plain text ciphertext pairs. What does it actually
cost to tackle that on a quantum computer. What's the actual
hardware cost if I wanted to break it using a quantum computer.
Say using Grover's algorithm. There's no speedup to be
no
exponential speedup, but what would the actual
like if it
would actually have a quantum computer and I would want to
implement that attack, what would be the hardware resources to do
it? That's one question, for instance, tackled in this field of
quantum computer cryptography
crypto analysis. And one can
play with certain models, variations of the model. We just
finished a paper where we showed that if you give the attacker a
lot of power; for instance, you weren't to attack a block cipher,
you give the attacker the power to ask for plain text ciphertext
pairs where you have not the key but the bit flipped version of
the key. That's a model studied in classical crypto. It's
called related key attacks.
So it was used, for instance to break some wireless encryptions,
WEP, for instance, was broken using related key attacks. So you
don't get the encryption itself, but you get the encryption and
the encryption under keys that are related to the actual key, all
right? So in practice, of course, people just used a key and
then you know the key schedule, the next key will be maybe an
increment by one or so, right. And that knowledge is used, can
be used to break.
So we recently in the paper with [indiscernible] Stein, we showed
that if the attacker has the ability to ask for all bit flips,
like the encryptions on all bit flips of the key and to do it in
super position, then we can actually break any block cipher. Any
block cipher. Only one condition in the cipher, namely that a
sufficient number of plain text ciphertext pairs must
characterize the key uniquely. But for actual ciphers that
actually one of the design conditions that it's not happening.
So for AS, for instance, two will
is a very high probability,
suffice. If viewed two plain text ciphertext pairs, there's only
one key that can produce that.
That's, of course, not a realistic attack model for actual
breaking a cipher, but we found it interesting to ask the
question, like, once we go there, what are the interesting
attacks, what is the
how to formalize these things. So yeah.
All right. Sometimes people joke that there are only two
algorithms around, factoring and search. That's all. I'm always
trying to argue, no, there's more. There is more. There's also
many cases of exponential speedups unknown. There are many cases
of polynomial speedups unknown. But as a true
there's a truth
in that statement that they are only very few primitives really
that we know of in algorithm design. And they seem from the
ability to amplify amplitudes by some iterative process,
typically very sequential. We do something that increases
amplitudes we like and suppresses some we don't like. But it's
slow.
And the other has to do with the Fourier transform, right. So
like the ability to find periods. And in a sense, most of these
problems
most of these algorithms use these features, or they
use them in conjunction.
>>: So is it a hidden shift problem, or a hidden subgroup
problem?
>> Martin Roetteler: In a sense, it's going to be both. In this
talk, it's going to be both. But on that slide, it is a hidden
subgroup problem. Like here for instance, it's the Heisenberg
group and here, in the affine group, one can find hidden sub
groups. Not for all sub groups of the affine group, but for
certain sub groups, if they're large enough, one can find them.
I'm not sure if there's any shift in this one. No, here HSP
always refers to a hidden subgroup problem. So let's go back to
the Fourier transform. In a nutshell, this is why Shor's
algorithm works if you have not seen it before, this gives you a
very quick explanation how it works.
So in order to factor a number, one reduces to the problem of
finding a period of function so think of what you can do is set
up a state in a quantum memory that has amplitudes that are very
binary. They're either zero or nonzero, and the nonzero ones are
spaced out in a very regular pattern. You know for sure that
they're always spaced out with a period R that you're trying to
find.
Once you can find the period R, you can factor. You're done.
There's a reduction to that. So the only hitch is that you will
not get this comb of these peaks, but you will get it with an
offset that you cannot control. That's the only down side.
Otherwise, you can just set up that state perfectly in a quantum
memory by just using the exponentiation map.
Okay. So what to do with that. So if you would just sample, it
would be no good, because you would, the next time you set up the
state, you would have a different offset, so you just sample
randomly, essentially, right. If you sample now, you get a
random sample. So Shor's idea was just take the transform of
this, what it will do is it will transform that shift here into a
linear phase on top of the delta peaks, right. If you just look
at the amplitudes, it will still be very uniform, spaced out with
1 over R, but the information about the shift goes into a linear
phase, a complex phase that sits on top of these pulses.
But if you measure, we don't feel that phase. We measure, we
pick up a particular state with amplitude given by the absolute
value square of that number, and so we will actually sample a
multiple of 1 over R. And now there is a classical
reconstruction that goes from K over R, it extracts that R,
right. If it would just be written there as a rational number,
it would be easy, right? You would just look at the denominator
and the denominator would be it.
It's not as simple as that, because you cannot work with any
length here the Fourier transform, you have to work with a length
which is a power of two. So you get something of the form alpha
divided by power of two. But there's a classical algorithm based
on continued fraction expansion of that number you get, which
extracts for you that R.
But essentially, what the algorithm does, and why it works, is
it's able to forget information about something you don't want.
So you don't want the coset information. By taking the Fourier
transform, you can forget the coset information. In general,
that's also what we want to do. We want to go from the coset
that we can get. And generally, we can kind of zoom in here and
what we would like to do is we would like to go back to here kind
of forget the coset representative.
And what the Fourier transform does is it goes from that to some
perpendicular space, all right? So if it's abelian, everything's
very nice. And a coset goes to just a coset of a perp group.
And then knowing the perp group, you can go back to the original
group. But in a non abelian case, there is no such concept of a
perp space. That's why this doesn't work. But for the abelian
case, that's exactly how it works. This idea works for general
abelian group, and the, like a little bit more mathematically
what happens is you would set up two registers in your memory.
So the zeroes stand for some state you know how to create.
Actually, there are many of them. You need mean qubits. You
initialize them all in zero, and then you create a super position
of all inputs to your function. The function is just model
exponentiation. And then you evaluate that in super position.
You create that state and then you can ignore the second register
out there. And what you then have is a state that looks like
this. That's all you needed really. Kind of the bottleneck is
the implementation of that map F. And there's some nice work on
this, like Chris in the audience, he works on how to lay out that
F in the very shallow circuit on a 2D nearest neighbor
architecture which actually what you would have in a physical
computer.
And that, in turn, boils down to do arithmetic efficiently. If
you want to implement the function F in that particular case,
that's what it reduces to. The Fourier transform itself is not
even very costly, that step itself is not very costly. It turns
out that you can take the classical Cooley Tukey algorithm, which
factors that very dense matrix, the Fourier transform matrix,
into a sparse product. But it's not just sparse. It's very
tensor sparse. Sparse alone is not enough to be efficiently
implementable. But if you have a lot of tensor products in
there, sort of that's simplifying. But that's always a good
structure to be exploited, because in a quantum computer, tensor
products come for free. You only need to implement the tensor
factors. Then it actually affects a matrix like this.
So that, in essence, allows you to improve over the N log N
algorithm for FFT. In a quantum algorithm, you can do it with
log. Actually, one can even shave off a few more factors here.
So Cleveland and Watters have shown that you can do this in log N
log log N if you want. And there are various approximations that
one can do to the circuit, for instance, these rotations become
very, very small. At some point, one can just neglect them. One
can just prune a lot of these controlled rotations. And there
are many other things that can be done.
One can, essentially, compress that circuit to linear depth by
rearranging the gates. And if you have ancilla qubits available,
you can even compress it further. You can, in extreme case,
compress that to a dep log N when N is the number of qubits. But
in practice that's probably not when you want to do, because then
you have a lot of ancillas to keep track of.
Anyway, bottom line is you should really think of a Fourier
transform as something that can be done as very
at very, very
little cost on the quantum computer. I like to think of this
actually as being no cost. It's like you can do a Fourier
transform for free. Like an optical computer. I don't know if
you know about optical computing. But in optical computing, the
idea always was if you have a signal, which is then just a light,
it's a light wave, it can propagate through a lens. Then if
you're at the right point, like in the focal point of the lens,
this thing will do a Fourier transform of the incoming wave and
it will do it physically at the speed of light, kind of almost
infinitely fast, you will do a Fourier transform.
This is sort of an analog. That's a little bit cheating, of
course, because you have to affect these gates. And doing a gate
might take some time, right. Because you need to do it
logically, and there might
>>:
[inaudible] I understand this is a
>> Martin Roetteler: These are the rotations about one over a
power of two. Yeah, the [indiscernible] are something like one
and then you get E to the 2 pi I. Sorry. Here we have DN. This
is the large. So
>>:
[indiscernible].
>> Martin Roetteler: So you see, like here, the N is
exponentially large. We need to do exponentially small
rotations. Here, those guys would really be over K with those
exponentially small. I might be off by one here, actually.
Because the smallest case, you would need to do, right, if that's
the circuit, you need to do a one I here, right? I think it's
right. This one is right. So the smallest case, if it's just
one qubit, there's no phase, do a Hadamard. If it's two, you
need a controlled one I. If it's three, you need to do a control
T gate and [indiscernible] and so on.
But if you implement that on a logical level, these guys still
cost, right, because we have these small rotations and you guys
know best, right. They will cause a lot of complexity when you
expand them. So on top of this, you would get, say if epsilon is
the constant, you get a constant expansion. But that could be
significant, right, that constant, and that might even depend on
your gate set that you have available for fault tolerant
implementations.
>>: If it would be possible to use some other Fourier framework,
like [indiscernible] or even prime factor algorithm, if you do
PFA, then your rotations in all these different prime factors.
>> Martin Roetteler:
it.
>>:
That's a very good point.
I thought about
I've been trying to get Krysta to look at it.
>> Martin Roetteler: Right, right. That's a very good question.
The question is, if I understand you correctly, the question is
how much of the classical extensive literature, how much of that
can we import into quantum, but it's all about the N, right? The
N could be prime. The N could be prime. The N could be prime.
The N could be a composite of two primes. The N could be a power
of a prime. The N could be whatever.
And for all these cases, there are many, many algorithms that
take care of it. Here for instance, there's a method by Rotter
which takes
which reduces the DFT of length P to a cyclic
convolution of length P minus 1, and then you can actually
recurse, right? So you take your DFT matrix, and you realize if
you permute it suitably. I don't know if you can see it, but you
can permute this part of the Fourier transform matrix into a
circulant form by applying a suitable permutation on the left and
the right. Then it's a circulant, and then you can diagonalize
the circulant with a smaller DFT and recurse.
So these things, they're kind of interesting classically, and
they lead to
in quantum, we always have the problem that we
have to deal with the permutations. They might be nasty. We
have to implement them. They're operated on exponential spaces.
They might be still costly. But also, like this guy, we could
argue, we could do recursively, but the diagonal matrix in here,
it's a long time that I looked at that, but I thought it was kind
of
one has to know something about maybe the quadratic
character over that field or so. But some work has to be done.
It doesn't come out of the box.
This one's kind of nice, because it's a tensor product.
>>:
That's the one I was really
>> Martin Roetteler: That one is really, that's a home run,
right? In that case, we have a co prime case. We can implement
it by tensor factors. Then how do we do these factors? We would
have to resort to something like this. If we ever run into that
case, we can't do that anymore so we have to do a Cooley Tukey
type of formula, which is good, but it has these twiddle
matrices. So the twiddles, they connect the two factors in that
case.
So essentially, what you get here is a one tensor DFTQ, and a
DFTP tensor 1, but there is a correction matrix here in the
middle. Let me call it T. And that thing is a diagonal matrix
and there's also like a permutation similar to like a bit
reversal. That thing will eat up some of the cost. But in
principal, yes. What I was interested in was the Ph.D. student,
I could never show it, is basically, there's a method called
Bluestein method. Bluestein allows you to embed any DFT of
length N into a larger DFT of length 2 to the K. So you can a us
choose a K large enough so that you can implement the DFT N as a
sort of a, a few invocations of a DFT of power of two lengths.
>>: But here you, don't even need such a strong result.
really want is to find the cycle.
>> Martin Roetteler:
>>:
All you
That's right.
So if the Fourier transform is too long
>> Martin Roetteler: You don't care, exactly. Because what
happens if you don't have the exact length, it will just need to
do some broadening of the peaks, and you're still happy because
you'll have the sample. For some of the convolution based
algorithms, I think it's still fine if you have the wrong length,
but then it's less clear that a widening of the peaks doesn't
matter. So it's kind of
all this
>>:
Sounds like research to me.
>> Martin Roetteler: It's research, right. But this is still an
open problem. Can we actually embed a DFT into a larger one up
to some correction. And Bluestein is really nice because it
embeds the DFT, it again permutes the matrix and then it makes a
tuplets matrix out of it, and then you have this giant tuplets
matrix, which it's tuplets and it's circulant and you can
diagonalize this guy again with that long DFT and then kill it.
But for that, I did never find a quantum analog. Because the
matrices, I could write down that were not unitary. So okay.
I'm doing very bad on time. How much more time do I have? Like
20 minutes?
>> Krysta Svore:
We have the room until noon.
>> Martin Roetteler: That sounds very good, but I don't want to
okay. Let me move on. Okay. This was the application that the
prime application right now in quantum computing. Just sample
from the Fourier spectrum of function.
But what about this idea? Classically, it's very beneficial to
perform convolution of two signals. Obviously, right, suppose
you have some signal in your memory, it's prepared for you, and
you want to find out how much or how well that correlates with a
given reference function, right. Maybe you're interested in
parts of that signal that look like this, and you want to find
out the exact locations in space where it correlates very well.
And so classically, it's known that this can be done in N log N
operations. If N is the length of the signal, you can do it in N
log N operations. How? Well, you can just invoke the Fourier
transform again. You take the Fourier transform of the signal
and of the function you want to correlate it with. Then you
perform a point wise multiplication of these two spectra and do
an inverse Fourier transform. It can be shown mathematically
that that sequence of operations just corresponds to the
convolution of the two functions, F and G. It's a cyclic
convolution [indiscernible].
And because all of the DFTs can be done in N log N operations,
and the [indiscernible] modification is just N operation, the
whole thing is N log N. Okay? So that looks like a very good
idea now suddenly for the quantum computer, right, because the
quantum computer can do these Fourier transforms essentially for
free. It's just log N square operations. Why can we use it to
perform massive convolutions on exponential spaces essentially
for free, which would be a fantastic application for quantum
computer. Why can't we just do that?
So you see what the problem is?
>>:
[inaudible].
>> Martin Roetteler: Yeah, the multiplier not only gets
expensive, it gets impossible, because
>>:
The exponential size.
>> Martin Roetteler: Its exponential size space and it might not
be a unitary operation anymore. You see? You're right. So it
has these exponentially many components. The spectrum of that
reference function that we need to know anyway. We would need to
know all these different frequencies, and then perform these
little multiplications.
But on top of this, some of these frequencies might be
might
even be zero, right, for all we know, or they might be very non
uniform. So that point graph multiplication does not correspond
to unitary in that case.
But the thing is, and this is kind of the message in this talk,
for some functions, it is a unitary. So for some functions, when
and then exactly when your spectrum of G is flat, okay, so let's
plot the absolute value of the spectrum. If that looks like
this, if it's constant and, say, it's one over square of N,
right, or also if it's very close to that, but then we can
perform the point graph modification, but just performing a
diagonal matrix with these elements on a diagonal.
So these
let's cook up a matrix, let's call it UG, which has
the Fourier transform at zero and so on. Up to [indiscernible]
minus one. If they're all flat, then we can renormalize them by
multiplying by square root of N and then they become complex
phases of modulus one, okay. So that's a unitary. Everything
else is zero. And by multiplying that unitary with the spectrum,
which is just like F hat of Omega Omega, we can do that, we get
the result that is just the point wise multiplication of the two
spectrum.
Okay. And then we are home and dry, right? Because we take a
[indiscernible] Fourier transform prepared the convolution state.
The question becomes
yeah, question?
>>:
[inaudible].
>> Martin Roetteler: Question becomes what function do we know
which have this characteristic. There's actually a bunch of
questions. The first question is how do we actually get that
state in the memory. How do we get the state we want to analyze
in the memory. That's a big question. And my answer to this is
we must be able to get it implicitly somehow. We must have a
circuit that's very small that prepares that state for us. So
you might ask, well if I already have a circuit, what questions
can I ask about it?
Certainly not going to be a two dimensional picture of some
landscape. I'm going to find an object in it, right, because we
don't have a circuit for that.
that thing into the memory.
We'd have to actually prepare
But F arguably could be a function that we want to analyze.
Maybe it is a pattern of a cipher, the output of a cipher. Now
you shift that same pattern in time, you get the same pattern,
but it's shifted in time. We certainly have a circuit for that.
So that like arguably, we can do it, but that's a tricky point,
how do we actually get F.
The second tricky question is your point, like how do we actually
perform these diagonal elements out here. There must be some
knowledge about G in order to do it. And the third question is
for which G is that applicable at all? What functions actually
have a flat spectrum that sounds like a little bit too much to
ask for, a function with a flat spectrum.
But the third thing actually is not so untypical, because if you
think of, say, a random function, it has a spectrum, well, it's
not flat, but it has
it fluctuates around this one over N,
right. It fluctuates. So it maybe looks a little bit like this,
but that's kind of good. So we could presumably renormalize that
function, right, we could divide the actual value, we could
divide it by this and make a phase out of it. Or we might do
some other tricks where we actually add a qubit and implement
that rotation even though it's a non unitary.
So some of the slides I have that deal with that situation of
renormalizing. But first, when I first studied this problem, I
was interested in what classes of functions are there that
produces an exactly flat spectrum. So in the finite, in the
discrete case, you can actually find such functions that have
that are flat in time and they're also flat in frequency. Yeah?
>>: If you can prepare [indiscernible] DFT to take a logarithm
of that before taking [indiscernible] DFT.
>> Martin Roetteler:
You mean the logarithm of
>>:
Of the DFT.
>> Martin Roetteler: Of the DFT as a transformation. Like the
DFT, [indiscernible] and take the log of this? I think that
would not be so easy. You would
wouldn't you get something
like
so the eigenvalues of the Fourier transform, there's just
four, right? There's like plus minus one, plus minus I,
according to some pattern. And the matrix that diagonalizes is
that's a little bit tricky to write down. And the eigen basis of
the Fourier transform, you can do it, of course. Like you can
>>: Solve it here. If you take the log and then take the
inverse DFT, you've got a [indiscernible] speech analysis.
>> Martin Roetteler:
>>:
I see.
And it separates out pitch from frequency periodicity.
>> Martin Roetteler:
I see.
>>: We have the fundamental pitch information and so, for
instance, become speaker independent to make the same thing
[indiscernible].
>> Martin Roetteler:
>>:
I see.
[indiscernible].
>> Martin Roetteler:
I see.
>>: And if you filter that at the end, you now have something
you can do a lot of analysis on for speech.
>>:
Yeah, but that's [indiscernible].
>>: I'm just saying if you actually get the logarithm
because
if you didn't have G, you only had DFT
a path which you could
prepare, let's say
>> Martin Roetteler: It's a good point. I would probably not
try to diagonalize DFT in that case, do something with the
spectrum and bring it back. I would probably look for direct
method of implementing the Cepstrum. It's a long time. I heard
about that, but I forgot what the kernel was. But I thought
about some transformations related to the DFT, for instance, some
fraction Fourier transform or chirp transforms. They can
typically all be implemented, or cosine transforms they can be
implemented.
It's always the problem with these things is there doesn't seem
to be a good use case of quantum computing. So there have been
some work on wavelets, for instance in quantum computing. Like
how to implement wavelets, [indiscernible] cosine transforms and
so on. But I never heard of any really good killer applications
for any of these methods. The problem is always the same, how do
we get the signal inside the computer and what is the question we
actually want to solve by filtering, for instance.
But having said that, there might be some applications. Maybe
there's really a use case where we want to filter something out
from a signal to make it independent or eliminate a feature or
so. And I've not done
I've not thought much about these use
cases. Yeah. More questions? No?
So the question is like what functions can we find which are
flat. Before I explain some of them, the application for this
would be the so called hidden shift problem. So hidden shift
problem is you have a function, and now you get a shifted version
of that same function and you want to find out what is that
shift. You already get sense that that's very related to this,
right? Because if you have your signal and now you shift it,
say, in time, that's exactly the case of this.
But this shift, this notion of the shift could be more general
than just a time domain or a cyclic group. It could be anything.
It could be a Boolean vector taking the XOR, or it could be any
other group, really. You could formulate that problem for any
group.
The (with that is, again, well, it's nice. It's a very general
concept, but it really works only for
really works well only
if the group is abelian, and that specific problem, even for
abelian group, it's hard. If it's a large cyclic group, that
doesn't work either. So nobody knows how to solve the hidden
shift problem over a large cyclic group.
So right now, the only things I know is it will work over a
over the Boolean domain. Then it works really well. So if
that's
that is the Boolean XOR operation, or one can also
extend that to say Z 3 to the N or something very small but
constant modulus, then one can find this S. But other than that,
that's very open if one can attack it, and that actually relates
also to the hidden subgroup problem. So you can
whenever you
have a problem like this, you can set up an instance of a hidden
subgroup problem, you can find a suitable group.
There's no longer the same group here. Even if that's abelian,
then you typically get something non abelian.
>>:
[indiscernible] two element groups.
>> Martin Roetteler: Not two element. I mean the binary vectors
are the direct product of the Z2. So, I mean, Z2
>>:
Two to the
>> Martin Roetteler: To the N. What happens if you make the
reduction from H, okay, I don't know a good
it's called the
HSSP, okay? No, let's call it hidden shift problem, right. You
can reduce hidden shift problem to hidden subgroup problems. So
if the hidden shift problem is over an abelian group, G, you can
set up a hidden shift
if the hidden shift problem is over an
abelian group G, you can set up a hidden subgroup problem over,
again, the semi direct product of that group. So that's like
sorry for those who don't know that concept of semi direct
product, but it means you take two copies
no, not two copies.
You take, you take the group and you add another component, and
now you
in order to multiply two elements, you're no longer
independent. You have to take into account what's written in the
extra component to know what's going on. So that kind of
>>:
[indiscernible].
>> Martin Roetteler: I was thinking of [indiscernible] group,
right. Which is special case where they interchange. But it
could be modular, it could be any action of a Z2. So in that
particular case, it's actually the action by inversion, right.
You take the inversion action, and that defines a semi direct
product. And in case the S inferences, you get the
[indiscernible] group. DN.
And so here's already the issue with that. So if you want to
solve the hidden shift problem, for here we end up with a hidden
subgroup problem for the hidden group, and nobody knows how to do
that. And if one would know how to do that, one can actually
tackle some lattice problems. So that's kind of why this, like
just applying that idea to the idea to find shift in time, like
if our goal is to shift things in time and to identify the time
shift, right, then it doesn't
it's not as easy. We don't get
a free lunch. We cannot just say we'll reduce it to the
[indiscernible] hidden subgroup problem and then solve it. More
has to be done.
But if it's as simple as the Boolean, the Boolean domain, I'm
going to show you next how to do that for the Boolean domain.
>>: You can also apply it to GF2, the [indiscernible], because
that's the hidden subgroup problem for ciphers over that.
>> Martin Roetteler:
G here?
>>:
What do you mean, like the group, the group
Yeah, take Z2 to the end quotient [indiscernible].
>> Martin Roetteler: And then you're interested in the
multiplicative structure of that guy, that's hard. That's going
to be hard, unfortunately.
>>: It is hard, but it's the same
I've heard, I don't know,
but I've heard that people apply hidden subgroup techniques to
find the period there as well.
>> Martin Roetteler: Oh, okay. If you just want to find the
period, that's fine. If you want to find the order of an
element, so now let's assume your polynomial was irreducible.
Now we've got a beautiful field. You get an element, you want to
find out the period of that. That's really period finding. And
yes, we can do it.
Here, it seems even more simple, because we just take the
additive structure of that group, not the multiplicative
structure, which is this, but we shift the function by an unknown
shift and we get that black box, which implements the shifted
function.
And the only thing we're allowed to do is ask the black box about
value, like value and function value pairs. And from that alone,
one can actually show the classical algorithm cannot do this. So
there's a relatively simple argument that a classical algorithm
let's see if I have that somewhere, yeah, a classical algorithm
will be faced with a problem that like it makes a few evaluations
of the function and then there's always a huge number of shifts
that are completely consistent with that data, right.
So the classical algorithm, whatever it is, let's assume for a
moment it's deterministic, it would have queried that function
and the shifted function, but it will have only access to these
very few points where it actually shifted, right. Sorry, no,
that's
the red guys are the ones it queried. And you now can
tweak the
if you're an adversary, you can actually change the
problem to a different S that's completely consistent with all
these samples that were made. And that's why a classical
algorithm has an exponential query complexity for these problems
typically.
But in quantum, one can solve it actually with one query. I want
to show you how that works. So there are some cases of functions
that would have that property. For instance, the Legendre symbol
into the has that property, that it has a flat spectrum. Like
technically speaking, you have to take out the zero point here.
It drops to zero here but the rest is flat.
In the case of the Boolean domain, so Z2 to the N, there are
functions which are completely flat. Fourier transform then is
the so called [indiscernible] Hadamard transform. It's this
function. And there are functions that are
so they're plus
minus one functions that have the property that all the
frequencies, the frequency is defined like this, they're all the
same, they're all like this. They're sometimes called bent
functions in cryptography. And they are like
they're known
they can only exist if it's even
there are some families
unknown how to construct them. There's no, like, complete
classification of them as far as I know. But like some of them
are very easy to write down, for instance. Even if you can
partition the invariables into two blocks and take the inner
product between these two blocks. That's always a bent function.
And so on. There are a few cases like this.
So there are examples for functions which are flat, and if it's
flat, you can do this. You can just prepare the zero. State is
again a register with many zeroes. You prepare an
[indiscernible] of X. You evaluate the shifted function into a
position, right, and let's say we compute into the phase that can
also always be done. And now, we compute a Fourier transform.
And now we use the fact that, again, if you shift a function,
could be the Fourier transform, you pick up a linear phase.
That's really important thing. If you don't remember anything in
this talk, this is the only thing I want you to remember. If you
take a shifted function, you have a linear phase out here, and
now if you can uncompute that thing, it's kind of backwards from
that discussion here, right. Their roles are interchanged
between F and G. But if you can uncompute this by multiplying
with a diagonal matrix, you end up with this state and then you
do another Fourier transform and you get S. It's exactly, S, and
you get it deterministically.
So okay. So the algorithm looks like this. It takes a Hadamard
transform to prepare all the inputs. It evaluates G. It
performs the Fourier transform, then uncomputes all these Fourier
coefficients, Hadamard transform, and you get S.
So the only difficulty in quotes is that if you have like a
general function that's not bent, it's that we might have not a
flat spectrum anymore. And typically, you get a very non flat
spectra. That depends on the circuit somehow, right, the more
shallow the circuit is, it turns out that the high frequency
guys, they die off somehow.
>>:
[indiscernible].
>> Martin Roetteler: You must have the promise that they are
shifted between G and F. You must have the promise. This F star
here is the dual bent function. They promise you must have on
top of it.
>>: You know, all you really need is some way to do that
multiplied. Having it flat is one way to do it.
>> Martin Roetteler:
Yes.
>>: Suppose, though, it was a tractable tensor product of such
functions, for example. That is, it's a bunch of different
things and it's a tensor product of them. So I can materialize
the tensor product in the same way.
undo it.
I don't know if you could
>> Martin Roetteler: I don't fully understand, but it sounds to
me that you would like to take several copies of something and
then use those to implement this on a subspace maybe and forget
about the rest. It turns out that's a good idea. But in
quantum, we've got to be very careful that we don't entangle
stuff outside our
like stuff you want to have inside our
computation space, we must not entangle it with anything outside.
So if we did something like this, we must make sure that at the
end of the day, all these extra tensor factors are entangled
anymore. Either they're cleaned up, they're brought back to the
same state, or we measure them and we can still say something
what happens here. I wouldn't want to say we looked at this
idea, but like I'm going to show you one idea how we can
renormalize coefficients by looking at the larger space. But the
larger space will not be so much larger. It will be just one
extra bit and we'll use that one extra bit to perform a rotation
so that we can do any angle here.
But then like in practice, that might be very expensive because
the rotation depends on the frequency, and could be anything,
right. It could be any angle. So it could as well be dead. But
then this result is really then just a query result. So it shows
that we only need to query these boxes a few times. Doesn't say
anything about the time complexity of the operations we need to
do. But query complexity is a very well established model. Many
people looked at that. There were more questions? No?
Okay. So if it's not flat, actually Dave Meyer and I think it
was a student of his, a Ph.D. student of his, they proposed a
method to renormalize it like this. But that leads to very big
distortions, especially if your function dies off quickly. You
might pay a very big price when you renormalize it. It might not
be a good phase element. So I think of these things actually as
like in optics as phase elements. So the Fourier transforms are
lenses, and these things are like a phase element that just
modifies.
So we want to design a phase element that performs the
convolution for us. People have done that in optics, but it
doesn't seem to be a lot of rigorous statements about these
methods. There's a lot of heuristics, how to do it, how to
enforce that it's a convolution and it's flat and I can tell you
also more about that if you're interested. But there doesn't
seem to be a lot of rigorous analysis, how long it takes to
converge to a phase element and so on.
So what we did is we did a different route. So we wanted to come
up with these phase elements, but kind of to do it
systematically. So just to rehash, so we have two cases, kind of
as extreme cases. For the bent function case, we know just one
query is enough using the algorithm I just showed. To identify
the shift. On the other hand, also, like the delta function is
also a valid function. You could ask, find out the shift of two
delta functions, but that reduces to search, right. You could
encode a search problem into that if you were able to find shifts
between delta functions that the shift now could be the answer to
some arbitrary complicated problem you want to solve.
And for that case, it's known that the complexity, you can never
do better than squared of N, no matter how hard you try. Even if
you take several copies, no matter
there's a result showing
that for search, a lower bound on the query complexity is squared
N, and it can also be matched with an upper bound. So the true
answer is squared of N.
But if you look at a spectra of a delta function, when you have
it as a plus minus one function, so it's constant everywhere,
except for one point where it's minus one, it again looks like
this. There's this huge peak. It looks like this. There's this
huge peak. Like these things, they look very different, right?
On the one hand, you have something that's constant. This guy
has this huge peak. So this is easy, this is very, very hard.
This intuition is the more flat it gets, the easier we can solve
these hidden shift problems.
So we wanted to make that motion precise, and it's very
tantalizing, because if you look at the algorithm, you see for
this is the hidden shift algorithm, right? Hadamard, and then
you evaluate G Hadamard F and Hadamard and you're done. What
does Grover do? Well, it does an initial Hadamard to set up
equal throughs, and then it operates the same
iterates the
same operator many, many times, and that slowly, slowly, slowly
rotates the state into the form you want.
And then at the end of the day, if everything can be done
exactly, say if square of N is an integer, you actually also get
S back. So they look very similar, right? It's just that you
get away with one round here, and here you do many rounds. And
it turns out that we can actually
we can make that more
precise. So the idea is we have that, like, say, we have a
general function. We get a spectrum that might look a little bit
spiky. And a spectrum is really this. It encodes the
information about the shift in a phase like this, and it has
these Fourier transforms.
What we would want is to forget that, to get that state, right,
and then we have another Fourier transform and we're done. In
here, we just Fourier transform the state and we get S. So the
question is how can we, if we have a copy of this state or
several copies of this state, how can we make this state? Just
what kind of process do we have that makes it.
And the idea is extremely simple. It's like if we have this
distribution according to Fs, how can we make the flat
distribution out of it? So that's like in classical analysis,
classical sampling theory sometimes called rejection method, so
if you have the ability to sample according to some distribution
it's called a P. But you would really like to sample according
to S, there's a way to do it.
So what you can do is you can renormalize this S so it fits
under, under P, and then in order to produce a sample, you do the
following. You first sample from P, which is what you can do.
You have a physical apparatus to sample from P, and then you make
a secondary thing. You kind of, you toss a coin and you accept
that as a sample if and only if
so it's kind of
you kind
of, you generate a uniformly distributed variable and you accept
if that outcome was under
in here, right, if that was in here,
you accept. If it falls within here you don't take the sample
and you redo the procedure. You pick another X and you do the
same thing.
It can be shown that the expected number of times you have to do
that is just
there's one over gamma. Okay. So that's
actually also the down side of this method. If you to rescale it
very much so it fits. If your gamma is very large
your
multiplier is a very small number, you might have to do it a
very, very long time.
And so that allows you to sample from something you have actually
no physical machine for. Suppose to press
maybe you have no
way to sample from this S. But just like tweaking it, would just
do it several times so it works like this. So this thing we can
do quantumly too. So how this works is as follows. So instead
of a probability [indiscernible] P, we have a state. So we have
a coherent state. It has certain amplitudes. They correspond to
the probabilities, and it can be entangled with another register,
actually.
And that other thing could be anything, actually. We don't make
any assumptions. They could be even an infinite space. No
assumptions made about that. What we want to have is we want to
have that same
the same entanglements between the K and that
other register. We want to maintain it, but we want to change
the amplitudes, two different amplitudes sigmas. So how can we
do it?
In that case, it's assumed that we have complete knowledge about
the sigmas. They're completely known. But what is not known is
these states. In the application of the hidden shift problem,
they are just phases. It's not even a qubit. It's just a phase,
a complex phase that encodes somehow to S. But we would like to
maintain the phase to make it an equal [indiscernible] position.
So it turns out if you just want to do this exactly, no error is
allowed, then one can exactly determine what this gamma is. So
there's no choice. You have to take the minimum of these
quotients here and then that's going to be the query complexity.
That's how many times you're going to need
that's how many
copies of that state you will need to perform the task. That's
not very good if you have to do that exactly. Because that could
be a very, very small number. Some of these sigmas could be
very, very small and that would kill that result.
But it's fine to prepare the target state with some error, and
that will help us a lot.
>>:
[indiscernible] a smaller number, not a bigger number.
>> Martin Roetteler: But it turns out if some of the signals are
small, you get a very
I'm sorry. Maybe it's the pis that are
bad. Some of the pis are bad.
>>:
But one of the gamma should be a max.
>> Martin Roetteler: But the problem is that's a very small
thing. Then we get a very large
like in case of the classical
sampling, so when does that happen that that's very small?
>>:
It's just in your picture
>> Martin Roetteler: Right, you have to scale, right, right. In
scaling, we have to multiply with a very small number. That's
what scaling does.
>>:
What happens when
(Multiple people speaking.)
>> Martin Roetteler:
I think it's fine.
>>: They're rather different here.
goes up.
>> Martin Roetteler:
>>:
I think it's fine.
This gets really low and
That's right.
Suppose P were minus S or something.
>> Martin Roetteler: Right, exactly. Or assume that some of
these components are really very, very small. Maybe drops off
very quickly. That's exactly what happens, actually, for the
delta functions. One is very big. That's fine. But all zeroes
so whatever you get, actually, you have to scale it. So under
all these almost zero components, they're not exactly zero.
>> Martin Roetteler:
Reject almost everything.
>> Martin Roetteler: You reject almost everything. Actually
turns out if you just apply that to the search problem, you don't
even get the Grover speed up. You're worse than Grover if you
just do [indiscernible]. But it's kind of a middle ground
between bent and the Grover case where this method is better than
what you can do naively.
So it's the mental error. We can do an error. So the algorithm
now looks like this. So you start with the state you're given.
That corresponds to the distribution P. And now you pick an
additional qubit. Just need one more qubit, zero. And what you
do is you make a conditional rotation, conditional on K. You
rotate that qubit into that state. You can do it. That's
basically big block diagonal matrix with two by two blocks that
all perform these rotations.
We can
and
That's
okay?
do it only because we have that knowledge about what pi
okay, sorry, this notation here changed from that slide.
because I stole it from various
those are the sigmas,
Those are the previous sigmas.
Okay. Those
we have complete knowledge about the pis and
sigmas so we can perform the rotations. And then we measure the
first register, and we keep only the case where we actually
measured one. If you measured one, then we have a state that
looks like this. It's renormalized to deltas, divided by the two
norm of that whole delta vector, which is exactly what we wanted.
>>:
But the one could be very rare.
>> Martin Roetteler:
>>:
Exactly.
So we have to run it a lot and still same problem.
>> Martin Roetteler: Exactly. And we know that there are cases
where we just cannot hope for anything better, because the delta
functions, right? The search, we cannot hope for better. But
this method helps if your function is not
if it's bent,
everything is fine, right? But if it's just a little bit non
bent, if it's a random function, or if it's
it's bent but only
change a very few values, that will only change the spectrum
slightly.
What can we do then? The original algorithm doesn't apply
anymore. The renormalization by Curtis Meyer might ruin the
whole thing. But this method is kind of graceful. It allows us
to change it in a way so that we can still prepare that state and
apply the hidden shift. The rest of the algorithm is the same.
So the idea is to go from that spectrum to that spectrum, and
then from that state to that state, and then Fourier transform
will do the trick.
Yeah, okay. So of course we need to be able to perform these
rotations and then once we're lucky, we end up with this state,
and we're done. Now we can analyze actually
okay, I'm sorry.
This notation has changed. Kind of twisted my head now, because
what was the epsilon on the previous slide is now a vector. I
apologize for this. But this essentially tells you like how much
error we allow in each of these rotations when we go to that
target state.
We might not want to go exactly to that, but we might also allow
an error, and that error is baked into that form here. If it's
ideal, the error will be no error. Then all these coefficients
will be one. We get the exact state we want. If there is an
error, we might have like an error vector, which might depend on
W. But we can say what the overall complexity of the algorithm
is. It's exactly given by this one over the epsilon vector.
That's how many queries it will take to convert the state into
the other state.
We can give an interpretation of this vector. That's kind of, I
don't know, some people can get maybe intuition from there. If
you take the spectrum, and now you fill it to some level with
water. The water level is given by the overall error you're
willing to accept. And then you get a vector, that vector will
be an epsilon vector, okay? That vector will be the vector
you're trying to get here. And we can show that for a given
target probability, for the algorithm, that's the best you can
do. That's the best we can characterize that using a semi
definite program that we can write down.
>>:
So that's where SDP comes in, picking the water level?
>> Martin Roetteler: Picking the water level out for that
algorithm. Trying to find the best algorithm, the best rotation
schedule, right, because the rotations themselves are not
completely defined because we allow error in each rotation. We
want to find what's the best schedule for these errors, and that
can be characterized as an SDP. It turns out, actually, that
this whole procedure then corresponds to reperforming the
[indiscernible] rotations and performing a single qubit for
normal measurement.
In the modeling here, we assume nothing. We assumed that that's
basically could be high dimensional, the measurement could be
POVM and so on. But what popped out of that SDP was one qubit is
enough, and [indiscernible] measurement is enough. You don't
have to do a general POVM. That's the optimal schedule for that.
[indiscernible] last year at ITCS, which is a computer science
conference.
And this year, we had a generalization of this when we looked at
several copies. Actually, this is just the first step maybe
towards a new interesting result. We just looked at what happens
if we don't have just one copy, but several copies. How could we
entangle them in a meaningful way and what happens if we do that.
So we looked at the so called pretty good measurement and we
found out a pretty good measurement in that case has a nice
structure. Just corresponds to this circuit. That's kind of the
single qubit circuit, and then in order to do the PGM, you just
need to do a bunch of CNOTs in that case. And then what you get
is a state that looks like this. It has the information about
the shift. It's entangled with a big register, and apriori we
cannot say much about that. But it turns out that this kind of
performs the self convolution of the function with its be, like,
it kind of
the more copies you take, the you take the initial
spectrum and you perform a convolution with itself, like so you
perform a convolution of the function with itself and then take
its Fourier transform.
>>:
So it's a central [indiscernible].
>> Martin Roetteler: We didn't get there yet. We haven't done
that analysis yet. How quickly, like how to relate T really with
the success probability. We have just one study. One small
result we showed. But we have not showed yet how to really
improve, like how to turn that into an efficient algorithm to
find S by using many copies. We have not done that yet. And for
that, we would need to do a study like it's probably not too
involved. I don't think we need very advanced probabilistic
methods.
>>: Because [indiscernible] those things is just adding the
[indiscernible].
>> Martin Roetteler: What you do is you take the spectrum, you
[indiscernible]
yeah, exactly. You have independent
variables. You add them up.
>>: You're adding them up, so they converge like
[indiscernible].
>> Martin Roetteler:
>>:
I think so too.
Or whatever.
>> Martin Roetteler: I think to think of this in terms of the
spectrum of the function and then I just look at spectrum and I
raise the coefficients to a power, which means I drop them
exponentially. I drop them geometrically. And the largest one
will survive that process. See, it will all focus on the largest
one. That's very, very fresh. Very ongoing research, what we
can do with that idea.
But my feeling tells me we have to look at several copies if we
want to use it to go from the Boolean domain here to, say, if you
want to go from here to that domain. One of the things to try
would be to take several copies and to see how the spectra look
like. If you take several copies and we look at the PGM, what
will happen?
In a sense, that has already been studied in the paper by a
[indiscernible] and Bacon and Childs. They've looked at the
[indiscernible] HSP and the PGM approach but maybe that could
rediscover some of thighs results. Maybe it will help to improve
some of the things they did, because it's a different flavor.
It's a free analytic flavor. And what they did is they reduced
it to subset sum problems. Which might be related to that idea
of taking sums of several independent variables.
But when you actively try to uncompute, you get these values.
Now you perform a computation which tries to kind of do a bin
packing for a target vector. But here, like I always like
Fourier methods. That's very intuitive to me.
>>:
Start with the exponential advantage.
>> Martin Roetteler: That's right, that's right.
of the day, we still have an advantage to show.
>>:
So at the end
So the gold has been found in that neighborhood.
>> Martin Roetteler: Yeah, the gold is what 18, like 49. It's
not 1849 yet. I think I'm running out of time. I'm going to
skip this part, I think. I want to leave you with one message.
Because we talked about hidden sub groups already. So the
message is that hidden subgroup problems, they really depend a
lot on the group structures. Abelian ones are fine. They can be
done efficiently. But in a non abelian case, if the group is an
[indiscernible] group for instance already not known how to solve
them, they're sit metric products are not known, even if you take
two copies of
if you take an SN and 2N, like SN itself, it's
not known how to solve it. And there are encouraging things,
though. Like information theoretically, it's not known there's
always enough information about the hidden subgroup if you take
enough copies.
That's why mean people are actually rushing in the direction or
were rushing up to some point. It's known that there exists a
measurement a POVM, that takes as an input several copies of that
coset and then output the information you're interested in. That
picture is specific for the graph isomorphism problem, very
related to the isomorphism problem, but it's a generic picture.
Whatever your subgroup problem is, you can take several coset
states and you can extract N and it's enough to take log G many
copies.
But nobody knows how to implement this, and we were asking the
question. So do we really need to do that? Do we need to
entangle all these coset states and do some classical processing.
Or is it maybe enough like ensure you really do single coset
state and then measure it. That's enough.
So is it maybe enough to do K where K's maybe two or three or
small number. Maybe that's enough to extract a subgroup. But it
turns out that that's not enough. So Moore, Russell and Schulman
have shown that for two and for three, it's not enough, and then
in the stock paper a few years ago, we show that you have to go
to N log N of these copies if you want to do graph isomorphism.
That's actually, that's kind of depressing, because it's a very
big quantum computer you need if you can't to tackle a graph
isomorphism. It's almost excessive, right, because there might
very well be a classical efficient algorithm. So you don't need
any quantum computer as all, maybe, to solve that problem.
And in practice, it seems like it works well to just look at a
spectra. And here we would need a gigantic quantum algorithm
that takes N log N of these registers and each register has many
qubits. And we don't even know how to implement the measurement.
But still, there was analysis of the Fourier approach to that
problem. And its representation theoretical, it works with
analyzing, like, there's a certain basis in which all these
states become block diagonal at the same time, and the basis
transform is the Fourier transform over that group.
And after you do it, your several copies of the coset states,
they will look block diagonal. You can do a POVM that selects
just one block. And the question is how do you further process
that information in the block. And so we show that actually
there's no good basis at all for the case where you want to
distinguish a trivial subgroup from an order two subgroup in the
symmetric group.
Intuitively, what happens is if you have the case where it's
trivial, it's just flat. If it's order two, it's also pretty
flat, no matter what the basis is that you choose. So there's a
probabilistic arguments coming in, but we were able to show that
the distance of the probability distribution that you're able to
sample, no matter what basis you choose, and the uniform
distribution looks like this, where K is the number of copies.
And this is a very small term so you need to lift that up in
order to even have a chance to solve the problem.
And technically speaking, it boils down to bounding expressions
like this, they're like character expressions. These are the
characters of the symmetric group. H is an involution. D is a
degree of the representation, and we get kind of a projection.
This is geometric part of the expression. This is the reputation
[indiscernible] part. That ranges over all the characters. And
it can be shown that those quantities are very, very small unless
K is N log N.
And it falls kind of
it's a win win analysis. We say either
[indiscernible] is kind of has a very high degree, like for SN,
typically the [indiscernible] have exponential degree. They're
very large. And for those that quantity is extremely small. So
we can kind of neglect them or we can bound them.
The other cases, if the low dimensional then that might be a
significant quantity, that fraction here. But then we have
geometric arguments that this part is
has to also be small.
So we can show that this whole expression is very small, meaning
no matter what basis you chose so carefully, you will sample
something that's very close to the uniform distribution and that
will have no information about the subgroup.
So that was, roughly speaking, the argument behind it. So
essentially, that stopped all the research up to now in that
as far as I know, in this hidden subgroup approach to graph
isomorphism.
There is very exciting other approaches where people try to apply
more physical intuition to it. Maybe like walks, try to define
the walk on the two graphs and see if the walks are different.
I don't know very much about that. I know, like I know that they
were also [indiscernible] found with those approaches, but maybe
that's a better approach, tackle graph isomorphism in the long
run.
Okay. One way, actually, one can make use of this negative
result is one can define a crypto system that's secure against
certain types of attacks. Namely, you can define a one way
function. If you could break that one way function, you would
actually be able to break graph isomorphism so because we have
that result, we know one would need actually a pretty large
quantum computer to actually mount this attack. So that is an
indicator that this one way function here is essentially just to
pick a bunch of random vectors and you multiply your input with
that, matrix matrix multiplication with a random matrix.
But it's known that if you could break that, then you could also
do graph isomorphism. But there's more to be explored here.
There are definitely more
it would be nice to see other crypto
systems that have this flavor. And it would be nice to see more
attacks on crypto systems.
Okay. Maybe the last two minutes, my take on D wave. Yeah, I
guess most of you know that it's an adiabatic paradigm. So one
starts with the Hamiltonian for which the ground state is very
well understood. Could be, for instance, all spin up or so. And
then your final Hamiltonian encodes a problem. A problem that
you want to solve. You know that the ground state of that
problem encodes the solution of some hard problem.
And now the idea is if you vary this slow enough, it will track
the ground state and you end up here. You can sample, you find a
ground state. One path to do that would be, for instance, this.
But this is by no means the only one. There might be many paths
that go from the initial Hamiltonian to the final Hamiltonian.
So in our group, actually, when all this D wave stuff came out, I
asked the members in our group, let's look at this, because we
cannot ignore it. It might be very unlikely that what is
advertised here is, indeed, working like this. But it might be a
probability of maybe ten to the minus five or four or whatever
that this works. And if that's the case, the impact would be
huge.
So we cannot completely ignore this whole approach, because if it
has
if it works to impact is huge, even though in my point of
view, at least, this statement that it attracts the ground state
is, I think, I think not what happens in that system. That's
probably much more noise happening so it's not just tracking the
ground state.
But so I ask the members and at that point, Boris Altshuler is a
condensed meta physicist, and we had a post doc, Harry Krovi,
Jeremy Roland, who was a research staff member at that point.
They looked at that. They assumed, yeah, indeed, the
let's
assume that the computer does this time evolution. But then a
question they asked was will it work for solving three set. And
around that time, that was what was advertised that this computer
could do. It could help you solve three set.
So they looked at random instances of three set, which is
typically what you meet in practice. If you want to solve a
practical problem, machine learning problem, you deal with a
random instance. If you do software [indiscernible], you do some
bound model checking, or you map to three set instances and they
looked very random. And they're very good solvers for that
around. Like 10, 20 thousand variables, they're not a big deal
for these solvers.
further.
So they could take over even much, much
So the question was, what can we say about the performance of
this algorithm in that very special case. They use perturbation
theory and found at the very end of this time evolution, right,
when your T becomes one, what happens is that you will have level
crossing and you will not find a solution of your problem.
At that point, I didn't understand anymore why that didn't end
the whole discussion about what a D wave computer is good for.
Because for random instances of NP complete problems, like three
set, the computer is not useful.
>>:
[indiscernible].
>> Martin Roetteler: Exact cover is very closely related. It's
true. It's a different problem, but three set and exact cover
are both very local in the sense that all the clauses involve
very few numbers. But then what happened is there was actually
something very exciting happening. So the whole theory of NP
completeness, while it's close on reductions, that's really not
the end of the store, because reductions don't preserve
randomness well.
Some instances of NP complete problems might not have a concept
of random instance. And that was immediately what happened after
it, we heard from D wave, and I have a lot of respect for these
people. I spoke with several of the scientists, and we invited
them to workshops. The problems that we're looking at was
instead of three set, let's look at clique. Clique finding, same
as maximum independent set, is also NP complete. There's a
notion of randomness for that problem, but it's very different,
right? And it's not close on the reductions. If you take a
random three set instance and are you reduce it to the clique,
you will not get a random instance that you would expect from
the, like intuitively, right. As this [indiscernible] model of
randomness for graphs, you just pick edges at random according to
some probability, and then you might ask what's the largest
clique in there, right.
And that's what they argued might be a better problem for them to
tackle than three set. And they admitted that the computer is
not good for three set. And I think that this finding, which, to
be quite honest, still relies on some assumption. It relies on
the assumption that by introducing this randomness in the
instance, you get an effect called localization, which actually
forces the wave function to be very local and that happens very
end of the time evolution.
This is just an assumption. So in this paper, they assume that
if you have the graph, it's like the hypercube, you will have
this Anderson localization effect, and it's not a proven
statement. It's just an intuition coming from physicists. But
they believe this is what happens for this order systems as far
as I know. That's why I have to say I'm not an expert in this,
but the people I spoke with say yes, this is a very credible
evidence that the algorithm group failed for this.
But it now boils down to, if you look at clique, it will boil
down to the size of the clique that you're looking for. If you
just take a random graph, the largest clique you can expect is
actually known to be very, very, very small. It's just something
of the size P log N or so.
If you take a random graph on N vertices, pick an edge with
probability P to do that for all edges, you get a graph. You
look at the largest clique and you try to find that. Then you
get something like this for the binary case where P is half. You
get this. This is the expected clique size.
Okay. So now, a classical algorithm can find cliques up to that
size in polynomial time. For instance, by just doing some sort
of brute force search for all the clique possibilities. You can
do this. So this is my
this is not my personal belief. But I
believe that this D wave computer might be good to find cliques
of up to this size or maybe even larger. Maybe a small constant
times this. But the arguments that were put forward in that
paper that grounded three set, I think they can be applied to the
problem of the planted cliques.
So if you generate a graph at random and you planned in a huge
clique, say a clique of size squared of N, right, and then you
forget where it was and then later you wake up and you want to
know, hey, where was that clique, right. So that problem of the
planted clique should be able to be handled by the exact same
arguments that these guys handled for three set.
So what I'm trying to say is it all boils down on the size of the
clique. Very large, I think these things could be done. Very
small, and like I called it a tiny clique problem. Arguably, one
could do with D wave computer. And why is that interesting? I
think it would be extremely interesting.
If one could show that using a D wave computer, one can find
cliques of size or even two times the size, one would have a
super polynomial separation between the classical algorithm,
right, which runs in time say two to the log N. That case,
right? Because two to the log N is not a polynomial. It would
establish a
>>:
Barely not a polynomial.
>> Martin Roetteler: It's just barely not a polynomial. Some
people call that quasi polynomial. But if that could be shown,
immediately it would show that a D wave computer, no matter what
it does, but it would do something you cannot do in polynomial
time classically.
>>: That is the D wave computer is not the canonical
[indiscernible].
>> Martin Roetteler:
Yes.
>>:
It has, in fact, some artifacts of all sorts of
>> Martin Roetteler:
Yes.
>>: So let's think of this as an adiabatic quantum computer.
[indiscernible] it's also non adiabatic to a [indiscernible].
>> Martin Roetteler:
>>:
Yeah, yeah.
But it isn't that anyway.
It is something else.
>> Martin Roetteler: It is something. The question is, what is
it good for? Can we have some evidence that this is good for
something? And I'm interested in kind of finding any such
evidence, because this device is there, and now the question is,
is it good for something?
So I think this problem with it, like tiny cliques, I have no
idea if that can actually be solved. Because it would involve
actually finding a scheduling for that machine and then running
many experiments and having evidence that it's faster than the
classical algorithm.
But my feeling tells me that might be possible. One idea would
be, for instance, just take the regular scheduling, and then run
it to a point where the wave function is still extended across
all the solutions and then you don't keep going, because you
might get localized. But maybe just sample then. Maybe then the
distribution about the actual cliques is actually very local and
you will get a clique. I don't know. It's just an intuition.
>>: [indiscernible] the gap closing, which means the probability
of getting the right answer is
>> Martin Roetteler: You want to stay away from there.
would want to stay away from there.
You
>>: It's past that point that you get useful information,
though.
>> Martin Roetteler: The thing is to find that sweet spot where
we have not passed the point yet. You're close to the point,
your wave function is still extended and then you want a sample.
>>:
I have another quibble with this [indiscernible] paper.
>> Martin Roetteler:
Yes.
>>: Is that the most optimistic [indiscernible] localization is
perhaps that the gap would go to zero. Maybe exponential of some
power system size. So it may still be that Altshuler and company
could be right, but that you might go faster than the
>> Martin Roetteler: They had some kind of optimistic
assumptions how fast it closes. They might have like some linear
like the exponent, maybe they assumed it's one or so.
>>:
Whatever the exponent it, E to the L or E to the square
>> Martin Roetteler: Exactly. And there was some criticism from
[indiscernible], some people at NASA. They criticized it, that
they assumed that the gap would close too fast. I thought that
Boris had a response to this. But I don't remember what the
response was. How he draws the intuition that it closed so fast
on the hypercube, I don't know.
But what I can tell you is like [indiscernible] gave a talk very
recently at Princeton about this, and he had a plot of these
like the bimodal distribution that everybody has seen, but he
also had some plots about what happens if you want to find the
ground state for some [indiscernible] models. And what sometimes
you don't find it. And if you don't find it, what is the actual
hemming distance of the point you find in terms of how many bit
flips are you aware.
And there was a huge chunk of the distribution which was actually
N bit flips away, which is what Altshuler and company predicted.
That you would actually be kind of very far away for these
solutions. Kind of that's an argument for me that this intuition
they had in the paper is maybe on the money, that there is
localization, and it happens at the
well, the analytic
argument is it happens at the very end. But it might have
happened even before.
>>: Well, I mean, some of the numerical studies of what the gap
was [indiscernible].
>> Martin Roetteler:
>>:
Right, yeah.
All depends on the problem also.
>>: Yeah, [indiscernible] paper on it also had, you could also
have a polynomial closing the gap, but multiple gaps. So your
probabilities go up to jumping the gap exponentially.
>> Martin Roetteler: That's right. It's very confusing. I find
it very confusing. For instance, also, there is like P, like
just take clauses they're all linear, right. It's clearly
solvable on the classical machine with the Gaus algorithm, but
even there it seems to fail, right. If you just like take linear
XOR or so, and you guys will know more about this, I'm sure. But
it's just my feel tells me that machine might be interesting to
do some things. Like maybe, like things related to, say, just
storing information, retrieving it. I don't know if anybody
looked at that. But in case for classical networks, like
[indiscernible] networks, it's known, you know, maybe not exactly
known, but it's like the capacity has been
of these networks
has been narrowed down to a very small region. Like there's like
maybe 0.138 or between 0.138 and 0.5. 0.15 times N if N is the
number of spins.
>>: Much better [indiscernible] but I can do it in a circuit
model better than I can do it in a
actually does that and
gives you beautiful super positions of storage states, like 50
percent storage but then get everybody out Grover. Like you only
get it out once. We're over time. You need lunch, so let's do
that. We'll talk after you're done.
>> Martin Roetteler: Okay. But that's interesting for me to
know, because I thought maybe that could be an indicator that
what happens in that system is, indeed, quantum and useful. Like
if the capacity
I mean, we all know we cannot store modern N
bits and N qubits, right, because of [indiscernible]. But maybe
if the constant is just different, maybe it's an indicator what
happens is quantum. Maybe it's useful for that reason.
That's the most sketchy part of my presentation. But I just
wanted to get it across. I don't think it's like complete
completely bad what happens in the developments around the
machine. Of course, there's a lot of hype and so on, a lot of
press.
All right.
Thanks a lot.
Thanks for sticking around.
Download