>>: Well, I'm very pleased to start out the... morning our first talk will be from Junfeng Fan of...

advertisement
>>: Well, I'm very pleased to start out the fourth day of this workshop. And this
morning our first talk will be from Junfeng Fan of Katholieke Universiteit at
Leuven on ECC on small devices.
>> Junfeng Fan: Okay. Thank you. So good morning, everyone. I didn't expect
so many people after the banquet, hung over. [laughter]. So would I like to
thank you the organizers to invite me. You know, mathematicians or engineers
don't sit in one room and discuss what to do. Normally it's always ready and
someone come to us and say please implement it. Just do it. It's just multiplier.
Why are you hesitating. Just please do it.
So I would like to take this opportunity to give you a short report on what we are
doing and what are the challenges, what are the problems we are encounter.
And I mean, maybe you are doing it in inefficient way or probably even the wrong
way or you come up with a better idea. So that will be great.
So the topic of the presentation is ECC on small devices. So the first question
you want to ask is what is a small device? So what are we talking about? So is
iPhone a small device? Well, you can hold it in the hand. But iPhone has a
processor, 32 bit processor running at 600 megahertz. It has up to 32 gigabytes
storage. It's not a small device.
So this is a small device. So maybe you can not read it. It says TPM 1.2. That's
the trusted platform module. It's produced by Infineon. If your computer was
produced after the year 2006, it's probably -- you probably have it in your
computer, but it's not switched -- normally it's shipped -- it's deactivated. And the
basic thing it does is after you push the power on button, it checks the BIOS, the
boot loader to make sure that the configuration of your computer is the same as
the previous one and then -- I mean they normally use hash function to do it and
then to check the hash value of your configuration is not changed. And then you
-- then you go to the next step, you know, to load the middleware and so on. So
by doing this, you can achieve certain degree of trust. So it's called trusted
platform module.
Microsoft is one of the partner of the -- member of this trust computing group.
The second one I'm going to show you is a magic card that, you know, you put it
somewhere in the machine and then you get goods or service and it gives you
the illusion that you actually didn't pay for it. [laughter].
So this thing here is a smart card. So it basically includes some information of
your account and, you know, some signature and certification and so on. So
currently it's using RSA.
And third example to show you is a RFID tag. It's proposed to replace the bar
code. So it's a very small device.
So why do we want ECC on this thing? So currently this one has RSA
coprocessor and this also has an RSA accelerator. But it is suggested that we
should use ECC. But RSA currently works fine in these devices. So ECC is a
candidate, but when are we going to switch to ECC, it's not clear yet.
But this device, it's very small, so you see this thing here is very -- is a tiny teeny
device. If you want to put crypto there, RSA -- ECC is no longer alternative, it's
probably the only choice you can use. So I would describe a little bit more why
do we want ECC on this kind of device.
So an RFID tag has two parts. Basically you have a small chip here that
basically stores a bit stream. It's identification. Especially the ID of the tag. And
surround it are wires which are -- is basically antenna. So how does it work?
So this thing has no battery, so it has no power supply itself. So when you want
to read the data from this small chip, you establish the electric magnetic field and
the antenna will, you know, will harvest the energy and store it on capacitor and
then do the computation and also the communication. So it's a very small
device. It's very cheap. This thing is so small that it's a challenge to attach it to
the antenna.
So it's now widely used, for example for supply chain management, for inventory
tracking and so O so if you go to a pharmacy, you probably have it here on the
bottom of the bottle. So you know, by doing this, the people working the
pharmacies don't, you know, mix up the drugs, you know, don't give you the
wrong drugs.
And in the future it probably gets more intelligent like it tells you when this was
produced and how many pills you should take every day or it actually even
reminds you that you should take your pills now.
So the second application is for pet identification. It's actually mandatory in some
countries that if you have a dog, a cat, you actually have to put an RFID tag
inside the body, implant it. So if it's lost you know you can easily find it. If
someone find it it can easily identify who is the owner of this little lovely thing.
A third application of this is for anti-counterfeiting. So this is a ticket for the
Shanghai Expo this year. It actually has an RFID tag inside. So when you go to
the -- you know, the Expo, you show your ticket, they read it with a reader and
they make sure that this one is authentic, it's not a fake one. So you that say for
these kind of applications RFID has to be unclonable. Otherwise there is no
meaning for doing this.
And the third -- the fourth application I'm showing here is a mobile payment. So
you can actually put a mobile -- a bank card inside the RFID and attach it to your
cell phone so you can use it for public transportation or you can buy something in
the vending machine. So you no longer need to carry your bank cards, you carry
a phone. And everyone has a phone all the time. So that's going to be a very
nice application in large scale.
So but there's a little problem. You heard this anti-RFID protest from time to
time. The problem is privacy. So if someone is standing in front of you and you
don't know him, it's a stranger, he's basically anonymous to you, you have no
information about him. But if he is carrying some RFID tags and you take your
reader okay, yeah, you see you probably can read the ID, the E-ID card that he's
caring, his name, his birthday and the country where he's from. He probably
doesn't want to reveal this information to you.
Or, you know, he's driving a Porsche and in some country or someplace this may
cause him trouble, you know. Some criminals may say this guy is rich, let's rob
him or something. Yeah. He probably doesn't also want to reveal this
information. Or maybe he's carrying some pills that indicates that he has some
disease and this is not good when he's going to a job interview. So we want to
use this technology but we want to keep the privacy.
RFID tag system is, you know, it's one reader or many readers with many tags.
So what makes a good RFID tag? It works. That's the first requirement. It's
cheap. That's -- you know, if you talk to anyone who works in this industry, it has
to be cheap. Otherwise nobody's going to use it. It's secure. So this is a little
bit, I mean, ambiguous. So what does secure mean in this context? You know,
they have some sort of definition for this. Secure means that only authentic tag
can -- legitimate tag can awe authenticate itself to the reader through a matching
conversation. So means that in you change the conversation or, you know, you
fake the tag you should not be able to authenticate yourself to the reader.
It's untraceable. It's means that the adversary should not be able to identify a tag
from the previous sessions. And it's scalable. That means, you know, you have
many subtags. It should be easy to add or to remove tags if the system. And it's
reasonably fast. You know, you don't want to stand in front of the vending
machine for one minute to finish the reading, right?
So let's look at the requirements. It's cheap. That means it has small area.
Because every gate -- every single, you know, gate in the circuit costs you some
amount of money. So it has to be small. It has to be secure. And that's where
the crypto jumps in, right? We need cryptography to ensure these properties. It
has to be untraceable and it has to be scalable. If you look at this topology, it
looks like very nice for public cryptography applications. So there are study on
this and they show that if you want to ensure this property, you actually really
need public cryptography. And it's fast. So it has to be lightweight. So, you
know, the complexity should be low.
If we put all of them together, that somehow maybe, maybe we can use ECC for
it. All right. Okay. Let's take a protocol as an example. So this one is proposed
by Schnorr in the year 1989. It's identical protocol. So it is suggested to be used
in an RFID system.
So the tag has secret information. It's a private key X, and the reader has its
public key stored on the reader. So what the tag does, tag does every time it
generates a random number and it computes -- it does one point multiplication
and send this R back. And the reader generates one random number and sends
it to the tag and tag performs this operation and sends this V back.
So then the reader checks. If there's a question, holds. If it holds that means the
tag really knows this secret X. So it works. But there's a problem. You see that
all this information are transferring in error, so any adversary can eavesdrop. So
if you have this three and you can easily derive the public key of the attack, so
every time you just try to compute this big X, if it's the same, that's the same tag.
So it the trust.
So new protocols to be proposed. There's another one proposed by Vaudenay in
the year 1907. So the idea is like this. So I'm not going to go through the
protocol again. But the basic idea is that the tag also does one encryption but
the encryption is a random encryption. So it's a probabilistic encryption, it's not
deterministic. And it claims that if the encryption algorithm, so the public key
crypto system is indistinguishable, you know, to a [inaudible] plaintext attack, this
scheme is secure, you know, it's a narrow-strong private.
So strong and narrow are like the way you identify attacker. So strong means
that an adversary can open the tag, can reach the ID, the key out, and close the
tag and put it back through the system. So that's a really strong adversary. And
narrow means the adversary does not know that reader acceptance it or not. So
he doesn't know the decision of the reader. So the adversary fall into this
category he's a narrow strong adversary. So this scheme is narrow-strong
private means that even adversary is this strong, he still cannot trace the tag if
the encryption system has this property.
So this sounds nice. We can implement this. But, you know, you need to
choose public key cryptography to do this job and should be efficient and should
be securely implemented.
So okay. Let's make an ECC for this. The ECC chip, you know, it's -- you -when you make an ECC processor for these kind of applications you always
have some constraints. For example, you should have small area, as I
mentioned before, and you should use smaller arithmetic logic unit. You should
reduce the storage to make this small. And you should take energy into account
when you make the chip.
And the performance is another constraint, so you should ensure a reasonable
areally fast computation. And also there are sophisticated physical attacks on
the chip. So we should make the chip secure enough to prevent these attacks.
So I mean I will describe a little bit of the design flow of hardware. I mean, since
most of audience here are mathematicians.
So when we design the hardware, we start from the design capture. We want to
make an ECC processor, okay. That's the design capture. You can write it in C
or C++ or Magma or whatever you describe your -- the thing that you want to
implement. And then you write -- you transfer it to a lower level language.
Normally it's called HDL, which is hardware description language. In many cases
a very log of HDL.
And then you do synthesis. It's like when you write C code do you a compile, you
know, generates assembly code for you. So up to here it's only a structural. So
there is no physical information in the implementation. So normally you write the
HDL, you do synthesis and you do you know some simulation. If it doesn't work,
you go back, you go several loops until you're satisfied here, and then you give
this implementation to the back-end designers so they do flow planning, they do
placement and routing. Eventually they give you a circuit -- a layout. So that's
how it is designed.
So okay. This is the layout of a circuit. So you say -- you couldn't say much
thing here basically. There are many logic SIOS connected and this thing is sent
normally to Taiwan or Singapore or somewhere produced and you'll get the chip
back.
So you see that when this thing gets large it's really impossible to do it by hand.
So that's why we have -- we have -- so from here all these things are done by
tools, by software. So you don't really see what, you know -- assembly code you
can still trace it to see where is this thing installed or a pointer and so on. But for
hardware it's somehow difficult in the end to somehow correspond something
here to the thing you have here.
Okay. So some terminology. Gate equivalent, that's the unit we use to count the
area in hardware design. So Gate equivalent is actually to measure how many
equivalent NAND gates you used in your design. A NAND gate is something like
this. It's very simple. Y is the inverse of A times B. So that's the smallest logic
gates in the library what we have. So we use it as a unit to measure the area.
And also with this gate actually you can construct any functions. So this is the
memory cell in hardware. So you use to store information. Roughly it's the area
of this thing is six gates. So the size of this thing is six times this. Roughly.
Depends on the library.
So why is this interesting? Because now you see it's very important to reduce
the storage. Let's compare RSA and ECC. Let's say ECC 163, and we say that
we can reduce the storage by 6 times element -- 6 times 163 bits so that that
makes 978 bits to be stored in the hardware.
And if you take RSA, that has the equivalent security, you need two elements in
this group, this you need to store 2048 bits. That's significantly larger than ECC.
So in applications like this, this is a huge difference. If you can use ECC you
probably don't want to use RSA.
And okay. So ECC looks better in this scenario than RSA, so how about hash
function? You know, you can also use CubeHash function to do authentication
and so on.
Let's take a look at this SHA-3 candidates. So there are still 14 left in the second
round. If you look at how many bits they have to store in the memory, it's actually
not much less than ECC. Of course this is -- we take 256 bit digest, so it's -- the
security level of this thing is a bit higher than this. But some of them are using
much more bits, you know, so much more bits than ECC. Some are less. So
hash function is actually not so far if you compare the size of ECC.
Okay. So hopefully by now I convinced you that ECC is a good candidate for
these applications. So we want to make an ECC processor. We have to make
some decisions. What field we should use. And then we can use binary field, we
can use prime fields. How many bits do you want to secure your RFID tag? And
what coordinate systems -- you know, how do you represent your element,
polynomial, normal or others? And what kind of architecture? And what kind of
physical security properties you want.
So the first thing we look at the difference between binary fields and prime fields.
So this is the adder for the -- in the Galois field. So you basically only need to do
one XOR operation. And this is bit addition in the prime field. So you see this
thing is significantly larger than this one. And multiplier is basically, you know,
built with a bunch of others. So it's not so difficult to say we should use binary
fields.
And the second one is security. And so we have a lot of choices. So ECC 131
bits, if we implement this, I guess Tanya will say that's a silly idea tomorrow,
right, because this thing is practically solvable, right? Okay. Let's move to ECC
191 and then your boss will say that's a silly idea. Because if ECC 1 key 33 -- 63
can do the job, why do you use 191? I mean, that's a waste of money. So, yeah,
we use this one. And if you look at the papers so far for RFID tags ECC most of
are use -- I think all of them are use this. That's the reasonable choice.
So next question is what coordinate system you should use. If you take a look at
this -- I didn't I mean list all of them here, but just to give an idea. So you see
that for affine coordinates for each point you should -- you need two elements to
represent. And the thing is that you need inversion in each K bit. And that we
don't like. It's very expensive.
For projective coordinates you don't have inversion. You probably need one in
end, but you need three elements. And we know that this storing 163 bits is very
expensive. So it's also not so interesting.
And Lopez-Dahab you actually can use only X coordinates. So it's very nice.
You only need one element for each point. But you also need in the affine
version you also need inversion and Lopez-Dahab projective version you'll need
two. But you only need one inversion if you use Montgomery Ladder.
For Binary Edwards Curves is you have something similar. That's the
W-coordinates in affine version projective version. So we use Lopez-Dahab
projective coordinates. Let's inversion, let's storage. That's the choice.
So let's look at this Lopez-Dahab coordinate system. This is the group law for
addition and for doubling. So you have two points. Each one has two
coordinates. If you count it, you need several registers. Four of them are used
to store these two points. One is used to store the base point, and you need two
temporary registers. And for doubling, something similar, but you need three
registers. And this is only possible when you use Montgomery Powering Ladder.
So how about further reduce it. There is one trick to do this. So basically we say
that we make the Z coordinates of the two points the same. So after the -- you
know point addition, point doubling, you somehow cross-multiply the X
coordinates, Z coordinates, and you unify the Z coordinates for both point. So by
doing this, you only need one Z coordinate so you can reduce from 7 to 6. And it
seems that you need three more multiplications but actually it's not true.
Because you have made the Z coordinates the same so for the next iteration you
can save some operations.
So the penalty is that you move from 6 multiplications plus 6 squarings to seven
multiplications, 4 squarings for using -- when you use these common-Z
coordinates.
And to further reduce the -- sorry, the hardware cost, you can also simplify the
register file architecture. I guess this is probably not so interesting here. So
basically you -- for normal register file you can access every register from
outside. And you this one you can only access A. And if you want to put
something in C, you first put it in A and you shift it. So you somehow simplify the
interconnection, which also reduces the area.
So that's the trick we played to reduce the hardware -- the area.
So the next question is power. So remember that this thing has no battery. So it
-- it, you know, harvested the energy from the field. So you need to reduce the
power consumption of this thing such that the energy you get can, you know, it
finish, make you finish your computation and do the communication.
And also, the distance between the reader and tag also depends heavily on the
power consumption you use. You say that the closer the tag is to the reader, the
higher the energy density is, so you can have more power. So if you want the
larger distance, this thing, you know, has to be using even lower power
consumptions.
So just to illustrate how the power is consumed in a circuit, and this is the
inverter. So when the input goes from high to low, the output from -- goes from
low to high. And the other way around is the same. So when the input goes low
to high, the output goes from high to low. So that's like the basic, you know,
element in the circuit. That's what we learn the first class for circuit digit, circuit
design.
And the power -- so when you say that -- when the input goes from high to low,
the output goes from low to high. And that's when you charge the capacitance
here. And when it goes from low to high, you discharge it so it goes from -- like
this. So when you charge or discharge the capacitance, that's where the power
is consumed.
So if you count how much power is compute -- is consumed, it depends on four -basically four factors. The first one, alpha, that's the switch activity. So you say
that the more you switch, the more power you consume.
The second one is C, which is the output capacitance. It's basically the value of
this thing. So the larger this is, you need more power to charge it. The third one
is the voltage. So the supply voltage. Obviously the higher it is, you know, it
consumes more power. It's also -- it has larger impact than the others because
you have a square here.
And the last one is frequency. It also operates, you know, the higher frequents
you consume more power than this. So just a side note that why the Intel is
moving from -- you know, moving to multi-code system instead of keep
increasing the clock frequency of their processors?
And one of the reason is that the power density's just too high. So if you go from,
you know, to 10 gigahertz or so on, you know, keep increasing the frequency,
then it consumes too much power that the metal wire inside the circuit will melt.
So it doesn't work in that way. So what do you do? Okay. Let's just keep the
frequency one gig or two gigahertz and we make multi-cores and we just copy it
past 4 cores, 8 cores. And we know it's difficult to program it, but that's not the
problem of Intel, right? [laughter].
So that's basically -- okay. Another reason is also for verification. So when the
circuit gets too large, it's almost impossible to fully verify the circuit. So instead
you just make a -- you know, a circuit that you can fully verify but you copy/paste
and then you parallelize your applications. So we somehow do the same thing
here. And the idea is basically the same.
If you look at the multiplier, this is a very simple multiplication algorithm. So you'll
want to multiply two elements represented in polynomial bases. You scan one
bit-by-bit and you accumulate and do you a reduction. So this is the circuit that
you use to perform one iteration of this. So you see that you have A of X here, it
comes and it's multiplied by one bit of B of X. So that's the multiplier. So
basically if BI is one, AX go through, if BI is zero, all of them are zero.
And then you add them. So this thing is XOR gates. You add them to C of X. C
of X is shifted one to the left. And you do one reduction. So that's a circuit.
So if you have this circuit, you have to use M clock cycles to finish the loop.
Now, in order to make it a bit faster we can actually use multiple bit-serial
multipliers and that's what we call them, bit-serial multipliers. Concatenate
together and you make a digit-serial multiplier. So now you can finish, let's say if
you have D bit-serial multipliers then every cycle you finish D iterations.
So you only need M divided by these cycles to finish it.
So why is this interesting? Now we want to make an ECC processor that can -we say the target we want to finish one multiplication within 250 milliseconds.
Okay. So that if your protocol has two scalable multiplications, it's done in half
second. You can change this. But basically that's what we said for the protocol
we want to use.
So now if you increase the number of digit-size, the digit-size, it's the number of
bit-serial multipliers in your multiplier, and you see a drop in the -- you can drop
your clock frequency. So here we have five lines, and they are area, cycles,
frequency, power, and energy. So first thing you look is the frequency. Because
now you have, you know, from here to here you have larger multiplier. You can
actually drop your -- the frequency to achieve the same delay.
So you see that from one to two to three you can a dramatic drop of the
frequency and which leads to a dramatic drop of power consumptions. But if you
keep increasing the digit-size, you see that the energy consumption almost stays.
It's -- you don't gain much here anymore.
So that's where you stop. Because if you keep increasing the digit size, you
actually also add cost to the hardware. You make it larger. So this is a tradeoff.
At some point you say, okay, if we -- if we keep increasing we gain less but it
cost more. So that's what we're going to use. So we use digit-size 4 in this
application. Okay. That's the end have the power consumption.
So we also need to deal with physical attacks. There are mainly two types of
physical attacks. The first one is side-channel analysis. The second one is fault
analysis. So, you know, like 15 years ago we believed that the circuit, the
cryptography circuit we make is a black box. You only see the input and output.
You don't see what's inside.
And this is actually not true. You know, this is a classic way to bypass the -- you
know, you don't really need the key, but you can use some other information to
test that your guess is correct or not, like here is sound and in the side-channel
analysis you use power or electric radiation or timing, you know or temperature.
You use this to say -- to gain information from the chip. And another one is fault
analysis. I'll give an example for both.
So this is a typical setup, I mean simplified setup for power analysis. So you
have an ECC processor here, and you want to measure the current that -- that's
flowing through the ECC processor. So what you do, you insert resistor here,
and you measure the voltage drop here. That's basically equivalent to the
current that going through it. And you measure into an oscilloscope and you see
it on the screen. And we call this thing a power trace.
So a simple example is this one. This is a classical left to right binary method for
point multiplication. And you scan the scalar bit-by-bit and when it's zero you do
a double, why it's one you do doubling and addition. So easy to understand.
And if you look at a power trace it actually tells you the K stream. So -- because
you know when the K bit is one it takes more time, because it first has doubling
and then addition. So you see this part is longer than this one. And so this is the
-- this part indicates this bit is one. And this is zero and that's a zero, another
one, another one, another zero.
So basically if you -- if you can -- you only need one power trace to read out the
completed key. So it's very simple but it's very powerful. So we have to prevent
this attack.
So we have a very nice algorithm, a Montgomery Powering Ladder. So you do
always do point addition and point doubling. No matter the key is zero or one.
So it's balanced somehow. You don't see this -- you don't see this thing in the
power trace anymore.
But there's one thing we have to be careful. So this balance is actually in the
algorithm. So it's balanced, mathematically balanced. But that means it's
balanced up to here. But what we really want is that it's balanced to here. So
when you implement it, you can actually make some thing that, you know, screw
up the balance. So in end you still see the difference in a power trace, and that
basically can be used to compromise the security.
More sophisticated attack is differential power analysis. So the basic idea is that
you have a real chip that has a key stored somewhere there and you have a
power model. So that's something used to model the power consumption of this
thing. So what you do is you run many executions, many point multiplications
with different base point and you measure the power, and that's what you get
here.
And then you make a key guess. You guess, okay, let's say the first two bits of
this key is 0, 1. Now, we only guess the first two bits. So you have only four
possibilities. You can really guess it. And then you do -- you give the same
inputs and you get the output here. But since you have power model, you can
actually give a hypothetical power consumption of the computation of the two K
bits. That's something here. Now you look at this, you know, and let's assume
this part is the end of the second point -- the second K bit. Then you check this
vector and this, you know, thing. If your guess is correct, then they have -- they
are highly correlated.
So you can use some statistics, you know, to compute the correlation of this
thing and this vector. And you can try different guesses, four of them. One of
them will have larger correlation than others, and that's the correct key guess.
And now you have the first two key bits and then you guess another two key bits.
You keep moving. And so that's how you recover the whole key. So also it's
very powerful.
So in order to prevent this, you have to somehow randomize the data that you
are computing here, you know, or somehow completely make it constant, the
power consumption is just one line. You don't see a difference. But that's very
difficult. So there are many techniques to do this.
And now we move to fault analysis. So you can actually open the chip and, so
this is a typical setup, and this is a [inaudible]. You open the chip. You shoot it
with a laser. And that can somehow like flip some bits of the data you stored in
the memory or change the multiplication or so on. So it's possible.
So assume that we have a nice implementation with a key here and it's secure
against the power analysis. So what it does is that it takes the input P and
computes P -- K times P. It gives you output. Now, if you look at the curve, this
is the curve actually we wanted to use. And we give the input P which is on the
curve, but if the adversary can, you know, change some bits of Y coordinates to
make it another P, P prime, and this actually P prime is defined -- is a point on
another curve, right? So the only difference is you can find at curve that differs
from this one, only A6. And this A6 is not used in a point -- additional point
doubling.
So you can actually choose a Y coordinate here such that this point is on this
curve and this curve E prime is a weak curve. So you can use the output, and
the output is also a valid exactly libarchive multiplication result on this curve. So
you can use this output and these inputs P prime to solve the [inaudible] on this
curve to get the key.
So, okay. That says we should verify the base point before I actually do the point
multiplication. So you check if the curve parameters are good or not affected and
you check if the point, P, is on the curve or not before you actually start a point
multiplication.
But, okay, can adversary actually inject a fault after the validation step? So you
do the validation and but right before you are going to do the scalar
multiplication, the adversary can inject a fault. So how are you going to deal with
this problem? But, okay, the question is that possible? Is that possible? How
likely that you hit a weak curve by inject a fault here?
So there's a tact proposed two years ago of the FDTC that makes use of curves.
So consider a curve defined on prime field and, you know, as we mentioned
before, we don't want to use Y coordinates, so we only use X coordinates. Then
this curve has a quadratic twist. And a twist curve can be a weak curve. So now
we are using this X of P to represent the point. If the adversary managed to
insert some random fault on this X point coordinate, actually he has a very high
probability to end up with a point on its twist. It's the probability is always one
over two.
So you can actually hit a weak curve and the weak curve is its twist. And I think
this was not really a -- I guess it's not really taking into account when NIST
recommended curves because some of the NIST curves also have this property.
So, okay, so let's just do another point validation.
So before we output the point, the result, we check again if it's on the original
curve or not. But you can do the same thing. You can inject another fault right
before you are going to validate it again. That's possible. Because you have a
high probability to move it back from its twist original curve. So how are we going
to solve this problem?
So okay, this table is probably too small to read from the back, but that's okay.
So basically we put all of the known attacks and the known counter measures in
one table, I want to see that -- how they interact with each other. For example,
this is a simple power analysis, time analysis, and we have a couple of counter
measures like we use indistinguishable point addition, point doubling. You can
use doubling on always algorithm, so basically you always do doubling and
addition. No [inaudible] zero or one. If it's zero, do you a dummy operation. So
just to make it look the same in the power trace.
But you see this algorithm is not good for fault analysis because, you know, you
inject the fault in the addition operation. If the result is correct, that means that
operation is dummy. That means that K will be zero. So this counter the
measure is actually helping another attack.
And if you look at the whole table you see quite some of them. So the H here
means that this countermeasure is helping another attack. And, sorry, I should
explain that. Hook here means that this countermeasure is effective to this
attack. A cross means that this countermeasure can be attacked by this attack.
A question mark means we don't know or it's not published. And dash means
that it's not related really. And star means that it's really dependent on how do
you implement it.
So what I want to see is that if you look at this table that we only actually
understand a small part of the known attacks and countermeasures and how
they chose the countermeasures to make a secure implementation is still not
quite clear.
Okay. Now, if we come back and look at the protocols -- so if the adversary
wants to attack an implementation of this Schnorr's protocol and where he's
going to attack, well, he can attack the random number generator. You can, you
know, make it extremely low temperature. You spray some liquid nitrogen on it
so the random number generator gives you BIOS the random number. So you
don't have 163 bit entropy but you only have 80 or something. And then that
significantly lower your security.
Or you can attack the scalar multiplication part for example, like what we
described before, above. You can also attack the integer multiplication, addition.
You know, you only -- you know, you get R1 or you get X, they are the same.
You only -- if you get one of them, you -- the security of the system is
compromised.
So it will be nice if, you know, when you design a protocol we take this into
account, like how many attacking points are there? And we try to minimize the
attacking points. You know, like make some of these operations less sensitive.
You can attack -- the attackers can have it, but it doesn't help you. I mean that
would be -- that would make our life much easier, all right.
Okay. So these slides I want to show what would be nice, if we have it. It's
basically like I have a dream. So how about we want to reduce the storage. We
want to reduce -- represent a point with only X or X coordinates plus one bit.
Because any way you have X and the Y coordinates is one bit information. And
this probably already exists.
By the way, we don't want inversions there. We want -- we want X coordinates
only, and we don't want inversion. Probably no multiplication neither. Only
addition [laughter]. And there's no weak twists. So you cannot use [inaudible] of
weak twists. And a random bit, you know, a random fault, one bit, two bits, N-bit
fault on the parameters, you know, A1 to A6 is not likely to hit a weak curve such
that you cannot, you know, make -- insert fault to the parameters.
And the protocol has minimized attacking points. There is only one or there's no
attacking points, for example. It will be great. And we want lightweight
countermeasures. So every countermeasure you use, you add extra hardware,
you add -- you need more time, you need more energy. So we really don't want
countermeasures. Okay.
So this slide's just to show some implementation results. So we have completed
the ECC 163 bit ECC, 163 bit Binary Edwards Curve can, hyperelliptical curve
defined over 83 bits Galois field, NTRU. Just to say that the performance. So
there's a small different here actually you should not -- should -- we cannot say,
okay, this one is a bit higher than -- you know, Binary Edwards Curve is a bit
better, a bit larger than ECC so that ECC is smaller than Binary Edwards Curve.
That's not true. It's -- it's small, you know, change in the configuration of the tool
will give you this difference.
So what we can see is that Binary Edwards Curve, elliptic curve and hyperelliptic
curve they actually can be almost the same area. And the power consumption
are also more or less the same.
Now, NTRU is significantly lower if you only use the encryption, but you should
also take into account that this parameter no longer gives you 80 bit security. I
think it's much less. So if you want 80 bit security you have to use a significantly
larger N and Q. So that will be look different here. NTRU encryption and
decryption together, they -- the area is not much smaller than ECC.
But the message is that ECC can be made small, it can use only 13 kGates, and
the power consumption is less than 50 microwatts and that's probably already
feasible to put it on a passive RFID tag.
So this is the layout of our chip, and it has been sent to Taiwan, and we expect it
coming back next month so maybe after two months I can give you more data
and the really measured power and performance. So, yeah, this part is a full
custom bar, so it's really designed by hand just to try to balance the branch in the
Montgomery Powering Ladder. And the rest is some controller and, you know, to
support different protocols.
>>: [inaudible].
>> Junfeng Fan: No. It's only one core. So, yeah, that's it. Thank you.
[applause].
>>: Are there any questions?
>>: If I didn't drive the [inaudible] reader in any way, son don't you have a total
breach of privacy [inaudible] reader and read the tag.
>> Junfeng Fan: Yeah, that's true. So if you really want strong privacy you need
mutual authentication. But that makes this very difficult because you have to
stow the public key or, you know, certificate of the reader and that adds much
more storage and, you know, also limits your applications. But that's a good
point.
>>: [inaudible].
>> Junfeng Fan: The designed choice?
>>: Yes. I mean you presented different curves ->> Junfeng Fan: Yeah. So this one, this chip we actually used the elliptic curve.
Once we distribute elliptic curve Montgomery Powering Ladder -- sorry?
>>: So the one [inaudible].
>> Junfeng Fan: Yes. It's a public curve.
>>: It's actually correlated by hand how big will it be and [inaudible] RFID
[inaudible].
>> Junfeng Fan: How big [inaudible].
>>: How big will the chip be if it were designed for a [inaudible].
>> Junfeng Fan: You mean the size of the layout?
>>: The size of the chip, yes.
>> Junfeng Fan: So I think this was like a -- yeah, I don't really remember the
actual size. I can look it up. Because I did the -- I did indeed do the post layout
part. We did the higher part, the front-end part. Yeah. But it's definitely much
larger than the RFID chip you have now I think, yeah.
>>: Any other questions?
>>: You were showing the [inaudible] XOR versus [inaudible] you cheated a little
bit because those XOR [inaudible].
>> Junfeng Fan: Yeah, true.
>>: So the [inaudible].
>> Junfeng Fan: That's true. Maybe I [inaudible].
>>: [inaudible].
>> Junfeng Fan: Yeah, because [inaudible] also has like two XOR gates or
couple addition, yeah. Yeah. That's right.
>>: Any more questions? Well, let's thank the speaker.
[applause]
Download