>> Kristin Lauter: So welcome, everyone. Thank you... introduce Hao Chen to give a talk today on the...

advertisement
>> Kristin Lauter: So welcome, everyone. Thank you for coming. It's my great pleasure to
introduce Hao Chen to give a talk today on the hardness of problems underlying homomorphic
encryption. Hao is a PhD candidate at the University of Washington in number theory, working
with William Stein. He's contributed to the Sage Math project, and he's made significant
contributions to understanding the security of the RLWE problem, which is important for
applications such as homomorphic encryption, so thank you, Hao.
>> Hao Chen: Thanks. All right, so thanks for coming. It's good to be back. Today, I'm going
to talk about hardness of problems underlying homomorphic encryption, and the subtitle is a
survey on reading Ring-LWE cryptography, so I plan to do both. So a little bit about myself, so
Kristin already said a lot, so maybe I'll just mention that I had a fantastic time last summer
interning at MSR here, and I worked on this RLWE cryptography, so that's my sort of my first
entry into this subject, and since then, I have been working on it. It's truly amazing and a lot of
fun. So before I start the talk, maybe I'll give a little overview of the structure of the talk. So we
have three parts here. The first part is the introduction, where I'm going to talk about the basic
ideas of homomorphic encryption and the Learning-with=Errors problem, and in the second part,
I'm going to introduce the Ring Learning-with-Errors problem, the definitions and the hardness
in the classical papers. And in the third part, I'm going to give a survey on attacks on Ring
Learning-with-Errors. This includes the already-known attacks on RLWE, and the new attacks
we developed over the last summer and since then. And then I kind of in the end give a
summary of the current security situations of RLWE and possibly future directions. Okay, so
let's get right into it. Okay, so homomorphic encryption, what is it? The scenario is like this.
Suppose Addison is a user who wants the cloud to process her data without exposing her data to
the cloud. So this is a very general scenario, because usually you would have some sensitive
data that you don't want to reveal to people, but you still want to outsource that computation. So
formally, if you have data X and a function F, and you want to compute F of X, so what you
would do is to encrypt the data X and send the encrypted data and the function to the cloud. The
cloud, not knowing what X is, must choose some other function, big F, to evaluate on this
encryption of X, such that when it sends back F of X and you decrypt it, then you get your
function value back. So at the end of the day, everyone's happy. So the question of course is
whether this is possible at all. So first, if you want to ask this question, then maybe you want to
ask what kind of F do you want to evaluate. So suppose you have an encryption function. Then
maybe the most immediate choice of function class is the polynomials, so it turns out that, well,
if you can do the addition and multiplication operations on the encrypted data, then you can
evaluate all polynomials, at least theoretically. In theory, this is called fully homomorphic
encryption, so that's just a name for it, and you see the property is that you want two operations
that mimics plus and multiply on the encrypted data, such that when you do the operations, they
correspond to the usual addition and multiplication on the plain text. So homomorphic
encryption has a lot of applications. So basically, if you have any kind of sensitive data, then it
will be a place where you can apply homomorphic encryption. So examples include predictive
analysis of private medical data and the Cryptonet project, which is a successful project of
Microsoft, including this machine-learning algorithm, neural network, and you evaluate a trained
neural network on encrypted data. And predictive analysis of genomic data, encrypted
databases, statistical analyses and biometric authentication, homomorphic key management, so
this is something I just learned yesterday when I was searching online -- so there's this company,
Porticor, which claims that they have this homomorphic key management, which sounds a little
interesting to me, so I just downloaded their whitepaper, but basically, they're trying to say that
they're using homomorphic encryption to manage users' keys, but not for the cryptography part.
And then also other applications, including multiparty computations, secret sharing scheme and
election schemes, etc. So you can see that there is a growing area of applications of fully
homomorphic encryption. And it is possible, and in a very groundbreaking work in 2009, Craig
Gentry, in his PhD thesis, proved that there is a way to do fully homomorphic encryption, and
the way he does it is use encryptions and apply ideas in ideal lattices to the problem. But the
problem with this original proof of Gentry is that it's very impractical. In particular, there's this
one very central gradient of the proof that uses a technique called bootstrapping, which is very,
very inefficient in practice, so people have been thinking about how to improve homomorphic
encryption schemes since then, and there has been tons of theoretical research, and actually,
since 2010, actually up to today, a lot of way more efficient schemes have been developed by
various people. And up to now, the most promising homomorphic encryption schemes, they all
base their security on two problems that are derived from -- from the standard lattice problems.
One of them is called Learning with Errors, which we will abbreviate with LWE, and the other
one is sort of an extension of that problem, which we call Ring-LWE, and they are developed
and introduced in 2005 and 2010, respectively, so that's what we'll focus on the rest of the talk.
And the examples of homomorphic encryption schemes using -- based on these two problems,
are one scheme made by Brakerski, Gentry and Vaikuntanathan, and the YASHE scheme by
Bos, Lauter, Loftus and Naehrig. So let's talk about what is the Learning-with-Errors problem.
So this is one screenshot I took from Regev's 2005 paper, and I think it's a very intuitive
demonstration of what the problem actually is. So you're given the prime Q and a passive
integer N, and your secret is a vector with value in the finite field FQ, so here it's denoted by ZQ.
So this is the secret. And you're asked to solve the secret given a sequence of linear equations,
but these linear equations are not exact in the sense that, so if you look at the first equation,
modulus 17, this is only approximately equal. It means that this value here can be 8 or it can be
7 or it can be 9 or some other values around 8. So then what it means is that even if you're given
-- even if you're given more than four of the equations, it's not guaranteed that you can solve for
the secret using this system, because this is not an equality sign. So the problem is posed in this
way, and that's the Learning-with-Errors problem. So that's now the central topic of my talk, so
I'll just talk briefly about what's the hardness result for LWE. So this is proved by Regev in his
2005 paper. Solving an N-dimensional LWE when the modulus Q is polynomial in the
dimension N implies an equally efficient solution to a hard lattice problem in dimension square
root of N. So I'll talk more about in one of the later slides what are the hard lattice problems we
are talking about. But basically here, if you want to have -- you need to have a square root
reduction, so this lattice problem gets harder when the dimension is harder, so here, you don't
really get N equality. And saying a lattice problem is hard just means that the best known
algorithm is not polynomial in the dimension, or rather, it's exponential or sub-exponential, and
if you allow quantum algorithms, then you can replace square root of N by N, which means that
in a world where we have quantum algorithms, this LWE problem is at least as hard as the hard
lattice problems of the same dimension. So that's the main hardness results for RWE, and now
let's talk about what is Ring-LWE. So when people developed this idea of Ring-LWE, the
motivation was to make the LWE-based homomorphic encryption schemes more efficient. So
somehow, using this scheme, you can pack more information into a plaintext, so encryption
becomes more efficient. So let's review the classic setup of Ring-LWE, so here, you take the
dimension N to be a power of two, and you take modulus Q to be a prime, which needs to be
congruent to 1 modulo 2N, and your ring R is the polynomial quotient ring of the polynomials in
one variable with integer coefficient modulo that's polynomial XV and plus 1. So now the
problem is you are given the secret S of X as a polynomial in this ring, and your goal is to solve
for this secret polynomial using a random noisy system, so here, the situation is sort of similar to
the LWE setting, but here you have random coefficients A of X, which is sampled uniformly
from this ring R, and you have small secret polynomials EI of X, so these EIs are unknown. And
also, like, if you know one of the errors is zero, then this becomes a trivial problem, because you
can just divide by, say, A0. And the real hardness of this problem is that you don't know what
these EIs are. And just by the way, when I say error, I mean these polynomials have small
coefficients, so that's the problem. And how hard is this problem? Well, before we discuss the
hardness, maybe I'll just try to generalize the LWE problem to arbitrary rings, so here, we're
really using a very restrictive class of rings, so remember, N needs to be a power of two, so it
grows up very quickly and it's very sparse, and we have relatively few parameter choices. But
actually, you can define this for fairly general situations, so we have essentially four parameters
that we can choose, the ring R, the modulus Q, the error distribution chi, and of course whether
the error is discrete and continuous. So I'll talk about this more, but it turns out to be a nontrivial
assumption, so this fourth parameter is a Boolean variable. Okay, so here, what's the ring R? So
usually in this problem we take R to be the ring of integers of some number field, and in some of
the -- in the most classical RLWE paper, they also used the dual ring of R instead of R, but in the
case of two power cyclotomic field, which is what this ring is corresponding to, the two are
essentially the same. And Q is, again, the prime, and continuous just means that we use
continuous error or do we use discrete error, and chi just represents a Gaussian distribution,
whether it's in undimensional Euclidian space or it's in R, so in the continuous case, we sample
the Gaussian distribution in Euclidian space. And in the discrete case, we sample them in R, so
I'll talk about how we do that. And we denote spherical Gaussians with R by DR, so the reason
here is usually when people talk about these problems, they want to sample the error in a
spherical way. So then how would you produce such a noisy sample. Well, in the continuous
setting, you fix the secret to be an element of this quotient ring, which is a finite set of cardinality
Q to the N, and think of it as that, and you choose A uniformly from R mod QR, and the arrow E
from a continuous Gaussian RN, and then the sample U output will be A,B, where B is equal to
AS plus E. So here, the first coordinate is uniform and the second coordinate -- well, the goal is
to make it look uniform inside this RN module QR dual. So QR dual is actually via some kind
of embedding becomes a lattice, so this is a high-dimension torus, and the error should be
scattered inside that torus, so that's the picture. And in a discrete setting, basically, almost
everything is the same, but you choose your error from -- sorry, you choose your secret from the
original ring, not the dual ring, and you choose A uniformly and E from a discrete Gaussian R,
and the sample U output will also be A,B, but now the sample will lie in a discrete set instead of
a torus. So when you choose R to be the ring of integers of the two-power cyclotomic field, this
goes back to the previous definition of RLWE. Yes.
>>: Is it essentially the definition of a continuous error selection is that you choose from a
continuous Gaussian and discrete Gaussian? Is that right?
>> Hao Chen: Yeah, yeah.
>>: So can you expand on what those two mean?
>> Hao Chen: Yeah, yeah, I'll talk about it. So a continuous Gaussian is a multivariate Gaussian
distribution, and we -- yes, I think we need it to be spherical, and I'll talk about what's a discrete
Gaussian. So maybe this is not in my slide, but if you have -- so if you have an N-dimensional
lattice in RN, then you can give each lattice factor a weight, which is E to the minus the norm of
the vector squared over some parameter, so you sample in this discrete set, and the probability of
sample each vector is proportional to the weight. So for example, if you have -- if the weights
parameter is 1, then you sample the 0 vector with probability e to the minus 0, which is 1, but
then you need to scale everything so that the total probability is 1, so that's what -- I think that's
what I mean by proportional. Yes. So again, R is just a ring. Yes, sorry.
>>: So you may have said this already. Are these both variants that are both in existing papers?
>> Hao Chen: So the continuous error is used in the original LPR paper in 2010, and the
discrete error is used in a paper by Kristin and her coauthors in the 2015 paper in Crypto of
attacking RLWE in different -- coming up with vulnerable instances of RLWE in a number of
fields.
>>: Most of the proposed schemes fall in the first category?
>> Hao Chen: The?
>>: The proposed schemes fall in the first category?
>> Hao Chen: The proposed scheme?
>>: The constructions that people are proposing.
>> Hao Chen: So in practice, what people use is this discrete version, because it's easier to
handle.
>>: But what they write about the paper is the continuous version.
>> Hao Chen: Right, yes.
>>: And the point is they're the same. The [indiscernible] the same there, but for other rings,
they're not the same.
>>: But didn't LPR already discuss the discrete case?
>> Hao Chen: Very briefly.
>>: Very briefly.
>> Hao Chen: They talked about whether -- they sort of implied that discrete is as hard as
continuous when the error is large enough, but they didn't give explicit proof. Okay. So I'll talk
about how to put a discrete Gaussian distribution on this R, because R is not a lattice yet, but
we're going to make it into a lattice, so maybe I just give some backgrounds and number of fields
and number of rings. So number of fields are finite extensions of the rational numbers, so
examples are like Q joined square root of 2, Q joined the root of minus 1 or Q joined the N root
of unity. And the number ring are the ring of integers in this field, so for example, you just
replace this Q by Z in these cases, and you've got the number rings, and they can be realized as
polynomial quotient rings in this case. So it turns out that if you have any number field K, then it
has the canonical embedding into the Euclidean space RN, and the image of the ring of integers
becomes a lattice, and so you can think of it as the image of the whole field K is a dense set in
RN, but if you only look at the ring of integers, then it's discrete. So, example, if you take Z
adjoined square root of 2, then the image of R in the canonical embedding becomes the lattice
spanned by these two vectors, spanned by the column vectors. So there's a canonical way of
doing this, and now we're going to talk about -- so now we have this lattice in RN, and
supposedly, we want to base our security on some hard lattice problems by showing that the
RLWE problem is as hard as these lattice problems, but maybe I'll first introduce what kind of
lattice problems we're talking about.
>>: So what is your definition of R in the previous slide?
>> Hao Chen: Here?
>>: The image of R, so you're ->>: The ring of integers.
>> Hao Chen: So R is the ring of integers of K. So in the classical case, K is the two-power
cyclotomic field, and R is the ring of integers of K. So now if you have an arbitrary lattice in the
Euclidean space, given by a potentially very bad basis, then we'll recall this notion of lambda I of
the lattice lambda. That's the smallest number S such that the ball of radius S contains at least I
linearly independent vectors in the lattice. So if you take I to be 1, that's the length of the
shortest vector in the lattice. And now let gamma be a positive parameter. The gamma SVP or
the gamma shortest vector problem is to find the lattice vector in lambda whose length is
bounded by gamma times the shortest vector length. And the closest vector problem, called
gamma CVP, is that you're given a point in the Euclidean space, and you want to find a lattice
point such that the distance between those two points is bounded by gamma. And then there's
also this discrete Gaussian sampling problem called gamma DGS, where you're given a gamma
and the lattice lambda, and you want to output a sample from the discrete Gaussian. So again,
I'll just briefly mention it again. A discrete Gaussian is where you sample each lattice vector,
and the probability you sample a vector is inversely proportional to its length. Yeah?
>>: So the hardness problem is the adversary gets -- it's either in an actual discrete Gaussian?
>> Hao Chen: Oh, here? So here, the problem is actually to sample from this distribution,
because it turns out to be hard.
>>: The definition of the problem can't just be output a sample, because you'd just output a
number and that would be a valid sample, something else.
>> Hao Chen: Yes, sorry. So if you output a random number, that would be a sample from
other distribution, but it's not guaranteed to be sampled from this particular distribution. So if
you -- so maybe let's think about the most one-dimensional case, where you want to sample a
discrete Gaussian on the integers, so there's no known way to do that, because the integers is an
infinite set, and you can only approximate that process.
>>: So I'm just wondering, so the actual hardness problem is stated in terms of distinguishing
some algorithm that helps to do this from an actual true Gaussian?
>> Hao Chen: So in the hardness proof, what they do is they suppose they have an algorithm to
output samples in this distribution, and then they show that they can have some kind of security
reductions. So the proofing LPR is -- to me, it seems like it's an iterative step that encodes also
the bounded distance decoding problem, so it has two steps. One is reducing the bounded
distance decoding problem to RLWE, and the other one is that if you have the discrete Gaussian
sampling and RLWE, you can solve BDD, and then you just use them like again and again until
you get to the point. So I don't see -- so maybe that's not a direct answer to your question, but
the problem here, I think -- so are you asking if this problem is well posed?
>>: No, I'm just asking what is the ->>: I think he's asking for something similar. What you suggest is probably right, that he's just
asked -- the usual statement of these hardness problems has to do with like a decision version
would be like can you distinguish whether this was a sample? Whereas you're talking about
more like the constructive version or whatever, which would actually be ->> Hao Chen: Yes.
>>: -- to output a sample, so the decision version would be just to distinguish whether it is ->> Hao Chen: Yes, exactly. Yes. So for all these problems, there is a corresponding -- I think
for the first two problems, there is a corresponding decision version. For DGS, I don't know yet
whether there exists, but I think there could be.
>>: So there, your goal is to produce -- your adversary has to produce an algorithm that samples
from something that's close to this distribution? So for example, this tends to produce short
vectors.
>> Hao Chen: Yes, yeah. So one intuitive way to see why this problem is hard is that when
your gamma is zero, it's reduced to finding the closest vector, because in general, you can
consider this problem by shifting your lattice by a point, and if you're shifted by -- so basically,
when this gamma gets smaller, you're more likely to sample short vectors than long vectors, and
this becomes a hard problem.
>>: Yes, so the point is it's very difficult to do this, because the Gaussian sampling algorithm
requires a very good basis.
>>: I'm not asking about why it's hard. I'm asking about is this a distinguishing problem?
>> Hao Chen: Oh, this one? No, this is not a distinguishing problem.
>>: Then how do you -- I don't understand the experiment that would invalidate the inception.
>> Hao Chen: I'm sorry?
>>: Just like a computational problem requires you to find the secret exponent, this requires you
to produce a sample.
>>: Right, but if it requires you to produce a sample, I can output 0. That's a valid sample. So
there must be some way which would evaluate whether somebody ->>: Or, for example, just using the basis, you can sample vectors in the lattice. They should be >> Hao Chen: So maybe in the [indiscernible]. Maybe in the -- yes, I have some doubts about
that, but I think maybe a good way to say that is that you want to consistently output samples
from this, like you want to have an algorithm that always gives you samples. So for example, if
you want to keep outputting 0, then that's not a valid algorithm for this problem.
>>: The adversary has no decent algorithms, so it's more ->>: It's an [indiscernible] assumption, so there's no way -- if somebody gives you an algorithm
without proof, then you can't tell if it raises a technical distribution. You're just assuming that it >> Hao Chen: Yeah, yeah.
>>: So I guess you want to have like -- you want to have an algorithm that produces a
distribution that's indistinguishable from this discrete Gaussian.
>> Hao Chen: Yeah, yeah, that would be a good way to define it. Because, right, because in a
security reduction, actually, what they do is they assume that they have this, and they use it to
produce a lot of samples, so just having one is kind of not enough. But yeah, the way I say it, it
makes people wonder whether you just need to output like a random thing. But yeah, thank you,
that's a good point. All right, and these problems get harder when gamma is smaller, so what are
the main hardness results of Ring Learning-with-Errors? So in the LPR paper, they proved that
if the error rate R is greater than square root of log N, then for any ideal in the ring of integers R,
there is a quantum reduction from gamma DGS to RLWE, and here this gamma is given by Q
over R times some constant. So if you look at this formula, it basically says that when Q is large,
then this problem is easier, and when R is large, this problem gets harder, so that's kind of the
intuition. And the real security reduction is complicated, and as described, it utilizes this
iterative step, and the intermediate problem called bounded distance decoding, and one of them
is quantum, the other one being classical. So if you really want to trace down the steps, then you
have to iterate between quantum and non-quantum steps. And if -- so this is for any number
field and any ideal. If you restrict to the case of cyclotomic fields and the ring of integers in
them, then you can replace the gamma DGS in the above theorem by soft [indiscernible] and
then Q over R SVP. So that means that for these fields, if you can solve the RLWE problem,
then you can also solve a certain approximate SVP problem. And in addition, if you also have
the case where you have cyclotomic field and your prime is congruent to 1 modulo N, then you
can replace search with decision. And I haven't talked about decision, but the decision RLWE
problem is not to discover the secret S but only to distinguish a set of RLWE samples from
uniformly random samples. So that's considered to be easier, but it turns out that if you have
these two assumptions, then they are basically as hard as the same -- of the same hardness. And
there is an informal claim in this LPR paper when they said when the error is large enough, then
you can do this randomized rounding technique that shows that discrete error sampling is at least
as hard as continuous. But there is no obvious direction going backwards. So basically, what it
means, that if you have a continuous problem, you can round it discrete version, and then if you
can solve that discrete version, you can solve the continuous. But it's not clear whether that
applies when the error is small, because the rounding produces some inherent noise that's
irrelevant to the error size. So now, after this hardness results, what's left to do? Well, that's a
very good question, because the paper did not really cover everything in the theory, so what if
the error is small or the error is discrete? So these two are not discussed in the paper. And also,
what if the modulus you choose is so large, so that maybe the SVP problem is not hard.
Remember, the SVP problem is only considered to be hard when you have a polynomialbounded gamma, but if you have an exponential gamma, well, then maybe that's easy. And also,
what is the hardness of decision RLWE for general number fields? Remember, this is only
established in LPR for cyclotomic fields, so if you want to choose your field to be something
else, then you have to worry about this problem. And also, another point is in practice,
homomorphic encryption schemes requires the error to be smaller for it to be more efficient,
because if you have -- in general what they say, a leveled homomorphic encryption scheme, you
can only evaluate circuits of a certain depth. And the number of multiplications you can do will
depend on the error growth, and if the error is larger, then it means that you can do fewer
multiplies before you have to re-encrypt everything again and start over. So it's definitely
desirable to use smaller errors, so these are the motivation of why we still study the hardness
problem of general RLWE. And what is our goal? So one goal is to explore the boundary of
security for all types of RLWE problems, regardless of whether the error is small or large,
whether it's continuous or discrete or we are using the dual ring or the original ring, so I think
that's something of value to the cryptography community, and also to the more practical side, if
you want to clarify the security of RLWE schemes using practical applications, so maybe we
want to advise what's the parameter choice so that it's secure, because if you want to use small
errors, and how small can you get before it gets insecure? So that's another question we hope to
answer. Okay? So the goal very naturally leads to study of the attacks on these RLWE
problems, so in the third part of my talk, which is also the last part, I'm going to talk a little bit
about attacks. So what are some of the attacks on RLWE? Well, the first way is to transfer the
instance. Well, it turns out that there is a way to do that. You take RLWE samples. You can
produce somehow like LWE samples, and then you can use standard attacks on LWE, which
forgets about the ring, and you have more generic ways of attacking. So you have linear algebra
and BKW attack, Arora and Ge and Laine and Lauter, so I'll talk about them individually. And
then, well, if you don't want to transfer to LWE, maybe you can exploit the ring structure of R,
so the first entry of my -- into this subject is this paper by Elias, Lauter, Ozman and Stange in
'15, and they talk about an attack using a reduction map from the ring R to a smaller finite field
and do the -- perform the attack over there. So that attacks the decision problem, and myself,
Lauter and Stange in a '15 paper, which we are submitting and following up, which we are
submitting soon, we talked about attacks that further exploits the ring structure and also the
ideals used in a reduction. So I'll talk about those. So how do you transfer an RLWE sample to
an LWE sample? So if you have the ring of integers R, it has the basis over the integers, so you
can think of it as N vectors, omega 1 to omega N, and if you fix that basis, and you're given an
RLWE sample A,B, well, then you can write each coefficient of B in terms of this basis as a
matrix acting on the secret vector plus the corresponding coefficient of the error. So if you think
about it as the classic RLWE that I talked about, this is just taking the coefficients of the
polynomials. So if you have two polynomials, then you take the first coefficient and the
coefficient of X, coefficient of X-squared, so it gives you a bunch of linear equations, because
this MA now becomes a matrix with known entries that are uniformly random. So that means
from one RLWE sample, you actually produce N LWE samples, so here, LWE is in quotes,
because this MA, its rows are not really independent. They're actually permutations of the first
row, so maybe -- well, just let's replace this N by 1 if you want, but still, you can apply standard
LWE attacks to solve RLWE. So how does this work? So assume this situation, that under a
chosen basis, you have some coordinate E1 that's so small so that you will know it's 0, so this is
like the first situation that's very bad, and people usually avoid it. Well, this is very insecure,
because then what you have is an exact equation in S, so that means that once you have N
samples, you will be able to just perform Gaussian elimination to solve for the secret, so yeah,
you just do linear algebra. Yeah.
>>: So why would one coefficient be known to be 0, simply from the Gaussian restriction?
>> Hao Chen: Yes, so people are using sampling from a spherical Gaussian distribution, but the
problem is that if you choose a basis, then the basis vectors have different lengths and the angles
between them are different, so the coefficient doesn't really -- sometimes, it doesn't really reflect
what -- so, for example, if you have a really long vector, then although you are taking a spherical
distribution, then the coefficient of that vector might still be very small, because if you want to
scale it back into the ball you're sampling. So, for example, this happens for this class of number
fields, K equals Q adjoined, N through of 1 minus Q, in a crypto paper by Kristin, [Kate] and
other coauthors. So in this case, they have a basis of the ring of integers, which is very skewed
in the sense that the length of the basis vector forms a geometric series. So then the longest
vector basically never gets sampled, so this coefficient E, maybe EN minus 1, is always 0 in their
parameter range. So this, for a cyclotomic field, this usually does not happen, but for general
number fields, anything can happen, so we want to exclude this case first, because that's the
easiest part that you can break an RLWE system. So that's that, and also, you have Arora, Ge
attack, which I think is really neat. Unfortunately, it's not super-practical when the error
becomes like not too small at all, but I think it's still valuable to discuss it. So what is it? If we
assume the coordinate E1 is not 0, but now let's assume it takes values in some small integers, so
maybe minus 2, minus 2, 0, 1, 2 -- sorry, maybe not minus 1, 2. Just minus 1, 0 and 1. So it
takes three values. Then you can take a polynomial F that vanishes on this set. For example,
here, you can just take X-cubed minus X, so X-cubed minus X will vanish on all these three
values. And now you have your equation where I have moved MA times S to the left-hand side.
You apply F to the left-hand side, it becomes 0. So here, you have eliminated the error, and this
becomes an exact equation. But now what's the problem? Well, the problem here is that this is a
linear form in a secret -- the secret has N components, and you're taking a polynomial of degree
D and evaluating on it, so if you want to expand it, you get all kinds of monomials made of the
secret of total degree D. For example, if D is 3, you may get S0, S1, S2, S0-cubed, S0-squared,
S1, all kinds of things. So how many monomials are there? They're big N to the D, and then
what you're actually doing is that you're solving a linear system in these variables, not the real
Ss. But now, you need to do Gaussian elimination in O of N to the D variables, so if D is larger
than 5 and N is larger than 1,000, then it's not very practical. And for LWE, you also have this
BKW attack, which is exponential, and again, it's an interesting theoretical attack, but in terms of
practical performance, it's not that good. Well, what's the idea? Well, the idea is you want to
still do Gaussian elimination, but remember now you have some error-terminating equations, so
every time you do an addition, like if you want to add a role to another, then it increases the
error, so the error basically doubles. And if you want to multiply a multiple of the role to the
other, then it gets worse, so you want to do it by using as few row operations as possible, and
also you want to keep a dictionary, like maybe for the first 10 coefficients of A and store
everything in a database, and you keep getting these samples until two of them happens to be the
same, so you can already see that this is an exponential algorithm, because you have to wait
exponential times, like maybe 2 to the N. So if you have two equations where the first 10
coefficients are the same, then you can subtract them, and then you have eliminated one part of
your equation, so you keep doing this and you organize yourselves correctly. Then you have an
algorithm to solve LWE, but this also only works when the error is small enough. So what is the
third point, which is on paper published by Laine and Lauter in the audience, in '15, where you
assume that the modulus Q is not polynomial in the dimension of the lattice, but it's exponential
in N. So in this case, the close vector problem, you can turn the problem into a gamma closest
vector problem on certain N plus D dimensional lattices, so here D I think is the number of
samples. So you have a big lattice, and you can solve a close vector problem on this big lattice,
and that will give you exactly the secret back, and this works when the modulus is exponential in
N. So in the paper they have on concretely coded attacks, that succeeds in a matter of hours, so
how do you solve this gamma closest vector problem? You can use a standard lattice algorithm,
like LLL or a [indiscernible] algorithm. And the complexity is polynomial in the end. SO now
let's go to the attacks that actually exploits the ring structure of the corresponding RLWE
problem, and what these -- turns out to be harder to find, and in this very nice paper by Elias,
Lauter, Ozman and Stange in 2015, they assume that you have the ring R as the polynomial
quotient ring of Z of X, quotient now by some polynomial, where you assume the polynomial
evaluated at 1 is 0 modulo to modulus Q, so this doesn't usually happen, but it sometimes does.
So when it does, you consider the reduction map from the ring R to the finite field of Q elements,
and it takes any polynomial to its evaluation mod Q, so because you have this assumption, this pi
is a valid map. It is well defined. And then you assume something on the number field that if
you take the arrow E and apply the reduction map pi, then it won't go to the uniform distribution
in the finite field FQ. So also, this doesn't happen for every number field, but let's assume it does
for the moment. And then the idea of attacking is just to loop through all guesses of the secret
modulo of pi of S, because pi of S can only have Q possibilities, so looping over Q possible
choices is actually not that bad. And then if you have a guess, what you do is you compute the
hypothetical error, because if this guess is correct, then you're getting pi of E. If it's not, then
you're getting maybe some garbage, so you can detect whether this is a garbage by using the
assumption that pi of E takes small values, so if you happen to get something with large values,
well, maybe you try again. If you get it again, then probably this is not the good guess, and then
you keep doing this until you get only one G left. And the complexity is a big of Q, because
you're looking over Q guesses, and some examples of vulnerable number fields also include this
case, but notice that the linear algebra attacks also apply to this class of number fields, but only
when the error is small enough. So I think -- currently, I think that the techniques used in this
paper can be applied even when the linear algebra attack doesn't. Okay, so now finally, there's a
slide with my work. So I'm glad that we come to this point. So now let's assume a different kind
of vulnerabilities that can happen. Let's assume that there's a prime ideal Q in R, and the Q has
residue degree F larger than 1, maybe 2. So this may not mean a lot, but basically assess that
you have a reduction map from R to this quotient ring, which is isomorphic to the finite field of
Q to the F elements, so this is not a prime field, but rather an extension of FQ. And we assume
that the error is more likely to lie in FQ than the uniform distribution. So for example, if F is 2,
then the uniform distribution would yield a probability of 1/Q, because FQ-square has cardinality
Q-square. But if for some reason the error is more likely to be in FQ, then we can detect this by
basically the same idea, like you loop over everything in here, and then you try to compute pi of
E. So that's -- and then you try and use some kind of statistical test, like chi-square, where you
compute the KL divergence, or there are various ways you can check for uniformness. If it's not
uniform, then you know your sample is RLWE, and I think I forgot to mention that these two
attacks are on the decision problem. But here this one can be applied to a class of number fields
where we actually also proved a search to decision reduction, and we attacked the search
problem. But yes, that's complexity is a big O of N times Q to the 2F for decision, and Nsquared Q to the 2F over F for search, so if F is 2, this is still like relatively small. And there has
been -- in a 2016 paper we are preparing, I think we can get this F, can remove this, too. Yeah?
>>: Is N still big? What is ->> Hao Chen: So rough ->>: 1,000, N equals 1,000, like 2 to the 10 or something?
>> Hao Chen: Yeah, so the goal is really to get N to 1,000, so in our paper, I think the maximum
we get is about 144, but I think in general we can get examples of arbitrary dimension.
>>: And what is -- can you give intuition for what it means, ideal [indiscernible] residue degree,
like an example?
>> Hao Chen: Yes, for example, if you take Q adjoin I, that's a quadratic field, and an ideal has
residue degree 2 if and only if it's equivalent to 3 modulo 4, so for example, if you take the prime
ideal 3, so the ideal generated by the number 3, then that has residue degree 2. So, basically,
residue degree just means that this quotient ring -- this quotient field is isomorphic to the finite
field of Q to the F elements. All right, so that's the attack, but is any -- so are there any
examples? So there are some sub-cyclotomic fields we found by searching, and also one can
explicitly construct a family of quadratic extensions of cyclotomic fields that are vulnerable to
this attack. So here are some examples. Yes, so these are actually the improved runtimes, so
where we did in the second paper, so you can see that in this 144 dimensional instance, and
solving in a manner of maybe a little less than two hours.
>>: Sorry. When I said 1,000, I meant that's what we use in practice as a minimum.
>> Hao Chen: Yeah, yeah, but this table definitely can continue, and it actually won't -- really,
yes. I think in the complexity, what matters more is the Q, so if Q gets large, then because it has
this two times F factor, then it will mess up the -- it will take the runtime, it needs to be longer.
Okay, and the other attack we had in the paper is very specialized onto the prime cyclotomic
field, so if we consider a prime P and a P cyclotomic field, then there are two attacks we derived.
So one of them is when we assume the error is discrete. So in this case, we can only attack when
the modulus is exactly equal to P, so this is a very special case. But when it happens, there is
some vulnerability, because then you also you have this F1 equals 0, so one can just directly
apply the previous attack in the paper by four authors. And also, in this case, there is a prime
ideal behind the scenes, and it's ramified, which is some technical term, but that means that we
can only attack decision but not search, but in this table, everything here, we can attack search.
And now we come to continuous error, and it turns out, if the error size is small enough, then we
can attack any modulus, so let's make no assumption on Q and suppose the error is continuous.
And then let beta denote this element zeta P minus 1, and we have this map that -- so you can
think of the domain of this map as the Euclidean space RN, so that's basically where the error
lies, and we take it and map it to R, but just taking the first coefficient of a polynomial in beta, so
here everything can still be represented as a polynomial in beta, because R is equal to Z adjoined
beta, so powers of beta form a power basis for the ring. So then very curiously, under this map,
you take the prime ideal P. Then it will go to P times the integers. Well, this is just because the
prime ideal P is generated by beta. And also, we know something about the dual, so here, the
work I'm doing is trying to directly attack the dual setting in the original paper of RLWE. And
you have an error. You can scale it so that the second coordinate lies in this RN, module Q times
the prime ideal. And then this part will vanish if you apply the map road to it, because once you
scale it, APS becomes -- APS lies in the prime ideal frag P. So then you can test for
uniformness, because if you start with uniform samples, you're going to have some uniformly
random dots in a circle, and the circle is R mod PZ, or R mod Z if you want to scale it by P. So
if your error happens to be small enough, then you can check whether the samples you get are
uniform, so this is the same idea, so what's kind of surprising is that it actually works for small
errors. So here, I want to comment that, remember, if we recall the LPR security proof, then
here you will really want these numbers to be about square root of N for their security to apply.
So that means that the parameters we can attack here is very much below their security
parameters, so we are not breaking their assumption. We're solving a hard lattice problem, but
this is a concrete way of exhibiting what is the security situation if your error is not large enough.
So here, if your error is square root of N smaller than what they claim, then it's very easily
breakable, and you can see the runtime. Even though you have a 1,000-dimensional lattice, then
this runtime is very fast. But this only attacks the decision problem. So now we have surveyed
these all kinds of attacks. Maybe I'll give kind of a summary of what the security situation is
currently, so first, let's talk about continuous errors. So in that case, I think although in the
original paper, they didn't do this, but still, it's to compare different instances, you have to
multiply and normalize your error rates, because you're dealing with different lattices, and their
sparsities are different, so it doesn't really make sense to compare the error rates without talking
about the lattice and how sparse the lattice is. So here, I'm basically rescaling by the lattice covolume, and the LPR paper proved that this claim for cyclotomic fields where you have
polynomial modulus and totally split, which is basically the same as saying 1 modulo N, then
ignoring this factor, because it's really small, but if you have square root of N error, then it's
guaranteed to be as hard as some lattice problem. And also, if you have any field, N polynomial
modulus, then you are required to have the discriminant -- the root discriminant of the number
field. And that's going to guarantee the security, so notice, this depends very much on the field,
so there's no a priori way to say. And then our attack is that if you have prime cyclotomic field
Q zeta P and your normalized error weight is about 1, then this is not -- the decision version is
not secure. And if you have time to expand this table, then for any number fields, I think that if
you have anything that's smaller than this, then the security is not clear. And now, for discrete
errors, I think although discrete errors is not -- it's used in practice, but surprisingly, there has not
been a lot of papers in the literature talking about their security, so they have this kind of
rounding results, but they always assume that you already have a continuous model. And also,
people do acknowledge that if you discretize a continuous error, that's not the same as directly
sampling from a continuous -- sorry, directly sampling from a discrete distribution. And in
practice, that's what people do. You don't do this extra round of rounding process. So here are
some summaries, so maybe yeah, all of them are basically already described in previous papers.
I'm just trying to put it all together to see that -- so with polynomial modulus and the normalized
error is to be around 1, there's a lot of instances where this is not secure. But what happens is
that if you take cyclotomic field and if you take a modulus that's totally split, then even if you
have small error, then it's still likely to be secure. So this is something that's not in a table, but
we're just saying that -- so from this table, I think one takeaway is that for general number fields,
the security very much varies on a field structure and the modulus and the error, but for
cyclotomic fields, the situation might be much more simplified. So yes? And some future work,
so as I just mentioned, there's a possible way, and I just had this idea a week ago, of proving the
hardness of RLWE problems underlying real-world homomorphic encryption schemes. So for
example, like the BGV scheme used in IBM's HElib library or the YASHE scheme and YASHE
prime used in Microsoft's SEAL library. So there, you really have a lot of assumptions on the
parameter choice. You're using the cyclotomic fields, or maybe you're even using the two-power
cyclotomic field. Your modulus is completely split, and you're using the non-dual ring and
discrete error and small error, small secret, so everything is very restrictive. So in that case, I
think one might be able to prove that in this case, RLWE is hard, without reducing to the lattice
problems. So you can show maybe using information theory that decision RLWE is equivalent
to RLWE mod Q, so it's another technical term, but basically, it says that if you take the error
mod Q, like we did in the previous slides, then it becomes uniform or not. So the two problems
are likely to be equivalent, and then you want to show that this is impossible or this is hard by
showing that the error modulo of prime ideal frag Q is pseudorandom in that any attacker cannot
distinguish it from the uniform distribution with a non-negligible advantage. So there are two
possible ways to approach this. One of them is using Regev's smoothing parameter, which he
introduced in his joint paper with another coauthor in 2004. And so this turns out to be very
useful, and I think I can use it to provide some kind of upper bounds on the statistical distance.
And another possible way is to use Fourier analysis and properties of this Jacobi theta function.
So this is kind of more from a number theoretical point of view, but you want to prove that
certain distributions on a finite field is close to uniform, and that statement can be translated into
taking the Fourier transform of the distribution that is close to the delta function, because the
Fourier transform of uniform is delta, and then you want to bound the difference between that
new function and delta, so that actually -- the transform, the Fourier transform, turns out to be
this function in the two-power cyclotomic case, so I think it's kind of surprising and exciting. So
I just have some very rough sample results from these preliminary calculations, but if you have
the dimension equals 128, your modulus is 257, and your error rate is about -- sorry. This is onehour mark, that this is my last slide, so I think I'm not too much over time. So this is the error
weights, and this is a very rough, very loose lower bound of the number of samples you need to
distinguish RLWE mod Q and uniform mod Q. So you can see that even if you are -- so here,
maybe N is small, so there's no real difference between square root of N and 1, but if you let N to
be larger and use a better analysis, I think it is possible to do more. So that concludes my talk,
and thank you very much.
>> Kristin Lauter: Questions?
>> Hao Chen: Yeah?
>>: So overall it seems like the takeaways for designing the system, you have to be very careful
in choosing your field, because choosing the wrong field is definitely bad, and it looks like
there's a substantial gap in the errors. If you make your errors very small, and that's also bad, if
you make them what people are doing what people are using today, that seems probably okay?
>> Hao Chen: Yeah, yeah.
>>: Are they about -- do you expect that's about a lower bound, or do you think there's a lot of
room where you can use smaller errors, but not as small?
>> Hao Chen: So I think the errors people use today, so in practice, people usually use
cyclotomic fields, so that's good. They don't really try a random new number field, so the error
they use, I think it goes between a constant and root N, so I think -- so root N is really in this
continuous analog, they expect root N to be secure. But in a discrete world, my work, I think
what I'm trying to show is that you have something smaller than root N, it might still work. But
what people actually is doing is that I think they're using small constants. But there's a
difference between asymptotics and actual instances, so if you're using a 1,000-dimensional
lattice, then the difference between root N and 1 is still not large. But here, I think what I can do
is instead of giving asymptotics, producing an explicit number that you need to go above to
ensure security. But for the schemes they use in practice, once I have this machine finished, I'll
just plug in the numbers, but yeah, that's in the next step.
>>: If you had to guess, place a bet, would you say that the values currently being used are
insecure or are too conservative or about right?
>> Hao Chen: I think they're secure. Yeah.
>>: I just can't resist making a comment on that point, though. There's a huge difference
between what IBM does and what we do. So they use not only small error but also small secret
and sparse secret. And it's the small secret and sparse secret have potential weaknesses. We
don't know them. But the small error also is an issue. And they get a huge amount of efficiency
from those combination of those three things, which we don't do.
>>: So you think they may be too aggressive?
>>: Yes.
>> Hao Chen: Yes, so IBM ->>: And they also use general cyclotomic fields, not two-power cyclotomic fields.
>> Hao Chen: Yeah. I think maybe a next step for me is to prove it for two-power cyclotomic
fields. For general cyclotomic fields, this, yeah, we need to modify these steps. But I think it's
still doable, but I think it's more complicated.
>>: Yes, so I think the point is that actually, we were using in practice very small standard
deviation for the error, like 3-point-something, but there are really no clear attack paths that
would make that vulnerable in any way. Like, we don't know -- there's no clear reason why 3point-something would be any less secure than 100 standard deviation or 1,000. The only case
where I know that there would be something there is binary, because then you can use some
other tricks. But even then, it's not like a total break. It just reduces the security by a little bit,
and then also it's true, like Hao said, the security reduction, even in Regev's original paper, it
works when the standard deviation is roughly square root N. It has to be at least square root N?
>> Hao Chen: Yes, yes.
>>: But then there's also like this other requirement, because like you pointed out in one of your
slides, the gap SVP that you get from that is like the gap is like NQ divided by the standard
deviation, roughly. So this NQ divided by the standard deviation must be small, so this means
that in fact the standard deviation has to be -- this NQ over standard deviation must be like
polynomial in N, and this means that this standard deviation that's dividing NQ must be like size
Q, so in fact there is this kind of hidden thing there that you need for the security reduction to
make any sense, that the standard deviation is linear, basically, in Q, but also bigger than square
root N, right?
>> Hao Chen: Yeah, yeah. For square root of N, it's sort of the literature lower bound, so here
you are secure, and there are a lot of attacks when your error is super-small, so here you're
insecure, so inside there is nothing up to date, but I think there is a lot of room to work on.
>> Kristin Lauter: Any other questions? I'll ask another one. So as you kind of pointed out, if
you think of the rings that could potentially be used, either two-power cyclotomic rings, which is
what Microsoft's implementation uses, you have the general cyclotomic rings, which is what
IBM uses, and then you have all the other rings. And from a lot of the results you've shown here,
there's really no reason to venture out into this area of all these other rings, because we already
see all these problems. So we've got good efficiency here with these two cases. So we can kind
of disregard all of these rings from a practical point of view, if you're an implementer. And now
you've got these two choices, IBM and the Microsoft choices. And you talk about this kind of
informal result from LPR that says why you think you can go back and forth between this
continuous and discrete error, so can you comment on the difference between the two-power
cyclotomic versus general cyclotomic, of going back and forth between discrete and continuous
error distribution?
>> Hao Chen: Yeah, yeah. So in the LPR paper, even informally, they didn't -- they have to
require the error to be large enough, like sort of at least square root of N, for there to be any
relationship between continuous and discrete, because the rounding process, the noise produced
by the rounding process is really not controllable by the error distribution itself. It's a function of
the field and the modulus, actually, and the secret. So the thing is, this internal noise makes the
reduction hard, so I think both in the IBM implementation and in the Microsoft implementation,
the error they use are below the security threshold. So that is one of my main motivations for
this, because two-power cyclotomic fields, instead of general prime cyclotomic fields, which
IBM uses, is more likely to be secure under this parameter range. For prime cyclotomic field, I
think it's not clear. The reason is that the way of sampling they use is -- also, I think the way of
sampling they use is not exactly the RWE distribution, so I think they sample from some
polynomial basis, but that would be a different distribution. So there's basically no hardness
results applied to their case, so you can't say -- at least right now, I can't say whether the IBM is
secure or I don't have any guesses. But for the Microsoft implementation, I have confidence that
even within their parameter range, it is still secure and it's possible to prove that. And that's
because there's a big difference between two-power cyclotomic field and general cyclotomic
field, and that the two-power cyclotomic field is kind of the most classical case, where
everything is reduced to a polynomial basis, and it just won't produce any weird geometry. So I
think that's the main difference.
>> Kristin Lauter: Any other questions? Thank you, Hao.
>> Hao Chen: Thanks.
Download