Document 17831833

>> Justin Stiles: Thank you for coming. I see a lot of friendly faces in the crowd and hopefully
the online presence is good as well. I'm Justin Stiles. I'm a part of OSU security and we're
sponsoring along with Research this talk that Josh is going to be giving us. There is a sign-up
sheet somewhere around the room. Raise your hand if you've got it. If you happen to make all
six in person you will get a special prize. Online doesn't count. Sorry, everybody, but that's
kind of one of the fun things that we are doing to get you to show up. We hope it's an
interactive conversation so Josh is ready to take questions during the talk, or you can save them
until after, but make sure you sign up on the sheet if you're here, if you're present. It won't be
something, you won't win a vacation to Disneyland or anything, but it won't be some random
tchotchke. It'll be worth, hopefully forthcoming for if you started today. Josh in my eyes really
needs no introduction. If you want to know cryptography or have questions on cryptography,
he's kind of the guy. You guys can read his bio. You know he is well qualified, but don't let him
slip up, if you've got any random crypto questions during his talk. I think he's up for the
challenge, so let's give a round of applause for Josh Benaloh [applause].
>> Josh Benaloh: Okay. Thank you very much Justin. I presume I can be heard? Good. Those
of you who online and didn't know about the prize, you can still make it here by the end of the
talk and get your name on the list. Just to make it clear what's going on, we've got six sessions
planned. This first one is going to be pretty much a broad overview touching on a lot of things
that are very high level and then the remaining five will dive into more detail on various things,
and there will be some flexibility. I've got some things in mind that I want to be talking about,
but if you have things, directions you want to take things let me know and hopefully we will be
able to accommodate. Anyway, to start things off a lot of people like to talk about internet
security. That's kind of why many of us are here. It's worth remembering that the internet was
not designed at all with security in mind. It was designed basically for a small number of
mutually trusting and trustworthy entities to exchange whatever they wanted to. And pretty
much sending anything on the internet is like sending a postcard through the mail in the case
where you don't really trust the post office. It's a difficult environment to work in, but we can
manage some things. What does a typical internet transaction look like? Assume that we've
got you out there browsing the web and maybe some merchant you maybe want to buy
something from. Let's say you want to get a book from Amazon. You may have in exchange.
Let me look at this. Here is the information. I like that. People who looked at that seemed to
have looked at these books also, so you look at those books. Eventually, maybe you get to a
point where you hit you have convinced me. I want to buy something. Amazon is very happy
and they say great. Please send us your credit card number. Well, okay. Here's my credit card
number. Remember that thing about postcards and whatnot. This is not a very good state of
affairs, so we want to find a way to do better than this. Ideally, everything, that whole
conversation is protected so that people can't see my browsing history and my Amazon
browsing history just in that little environment. But at the very least, we would like to encrypt
or someway protect the credit card numbers so that only the merchant sees them. Even that's
a problem and I will talk about that later. Let's try to get at that as our first goal. This is going
to take us a while. It's going to take us most of the next 30 minutes or so. To begin with, it's
worth mentioning Kerchhoffs's Principle. This dates back about 130 years and says that if you
are building a security system that the system itself, the security of the system itself shouldn't
depend on everything being hidden. You shouldn't use security through obscurity. You should
be able to make everything you are doing public except some very small bit piece which is the
key. That's the goal. It's interesting. At this time locksmiths adopted this wholeheartedly and
that's an industry that has now flipped the other way and they are like you shouldn't be able to
understand how locks work. Knowing how locks work makes you a special person who we have
to keep track of, so that's an industry now that is based very largely on security by obscurity,
not knowing the tricks in how to pick a lock, for instance. There are other industries that have
gone hard in that direction of trying to hide things, but interestingly, some industries have
embraced this coming out in the open. Not just cryptography but, for instance, the gaming
industry, Las Vegas, you can go to conferences and workshops where they'll talk about all of
their vulnerabilities completely openly, what tricks have been used at the blackjack tables in the
last 12 months and whatnot. They'll talk about things and they decide that sunshine is a good
thing and I think sunshine is a good thing. I think this principle is worth following, but it does
mean you have to be very careful with your keys. When I talk about keys, let me talk about
what I mean by a key. First of all, I'm not talking about a PIN. This is just a little tiny nothing.
I'm not really talking about a password either. This is something small, something pretty
guessable most of the time. What I mean when I talk about a key is a big random at least 128
bit string. The reason that this is important is that pins and passwords have a place but they
only have a place where off-line attacks are not possible. Think of what happens if you try to
enter a password too many times. You get locked out very quickly. If I had a way of checking
through a password space without being online, without interacting, then passwords, pins
wouldn't work at all. For cryptography though, if we're going to encrypt something you can
check off-line whether the, what possible key will decrypt it. When you have an off-line attack,
then anything like a pin or a password is terrible. Everything may be subject to an off-line
attack and it's something we need to consider. An attacker just as a rough estimate, might be
able to search through as many as two to the 64 values. That's a good indication as to the size
of the space that an attacker may be able to search through. Think about what that is. That
could be all numerical pins of fewer than 20 digits. There are very few people who I have seen
that enter 20 digit bank pins. Passwords, fewer than 14 lowercase letters, most of our
passwords aren't quite up to that. We could add a few characters of the alphabet. How about
alphanumerics, still fewer than 12 characters or printable passwords, still, anything fewer than
10 characters is in some sense searchable. These things still work for the online only
application of entering a password to get access to something and you get locked out. But if
you are trying to use this in a cryptographic manner as a key, you're going to have problems.
Don't even think about using any user chosen passwords as an encryption key, please, nor
anything that's been derived deterministically. That's just as bad. Basically, given some
ciphertext, an attacker can just do some sort of an exhaustive search, a guided search through
the space of up to two to the 64 things and crack your user chosen password or anything.
Basically, there's no such thing as a user chosen key. Think of a key as something that is really
just a big random value. Something that doesn't apply that much anymore but I decided to
keep this in his dictionary attacks, because you do hear people talk about these against the
space of passwords, say. The idea is, basically, it can often be the case that you try one
password and encrypt this common password one, two, three, four, five, A, B, C, D, E,
whatever, and then you can sort of scan through and try it to see where this encryption
matches something because you saw the way it was encrypted. There is a little trick that is
often used here of make sure everything is encrypted differently and use some sort of a SALT,
some sort of initial value. It can be public, that value but make sure it's a different value for
each encryption in some way and we'll be using that a lot more later in the next session. Let's
get into the meat of things, into the S’s and start talking about what you might think of as
encryption. An encryption starts out as symmetric encryption. Basically, symmetric encryption
is there is a key; I've shared a key with somebody else. I use the key to encrypt; you use the key
to decrypt. That is traditional encryption as it existed for literally millennia. Caesar ciphers, in
fact, can be thought of as bad. That goes back quite a way. How do we do this? When we talk
about symmetric ciphers, there are usually two major classifications, two broad groups of
symmetric ciphers we can talk about, stream ciphers, and block ciphers. I'll spend a few
minutes talking about each and surely you have a work and we'll spend more time looking at
these in a future session. Stream ciphers are few examples but really RC4. This one is like
everything. This is like 99.9 percent of what you see. A5/1 is used in cell phones, so that has
come up a little bit more lately, but RC4 used to be like 99 percent of all encrypted traffic
anywhere in the world was RC4. What you do with any string cipher is basically you build a
pseudo random generator. If you've ever seen a pseudo random generator, thought about
these, these things are very, very important for many applications, but all a stream cipher is is
just a pseudo random generator and all you do is you take the stream of output bits from the
pseudo random generator and XOR it with plain text to form ciphertext. Here's pretty much
what it looks like. You take your plaintext. You take your seed and put it in your pseudo
random number generator and you create a stream of key bits and you just XOR every bit and
you get a bit of ciphertext. How do you decrypt? You do exactly the same thing. You can even
use exactly the same function. The ciphertext goes in the top, gets XORd. When you XOR with
the same key bits twice you get back to where you started; you get the plaintext back. It's very
simple, very easy, so much so that I will show you one common example. I talk about alleged
RC4. There's an interesting history here. RC4 is a trade secret of the RSA Corporation. It has
presumably been leaked. There has been leaked implementations and leaked descriptions of
what this looks like, but technically still if you use RC4, you have to pay RSA Corporation, but as
a trade secret this thing called alleged RC4 has been discovered. It seems to on all inputs and
all outputs for anybody who has tried it does exactly the same thing. But we will talk about
alleged RC4 here. It's quite simple. You take a byte array, 256 bytes and you stick in all possible
values in order. You also take another array of exactly the same size and you replicate your key
as many times as necessary to fill this array. It's completely flexible. You can use 128 bit key is
commonly the case. You can use a shorter key. You can use a longer key, just repeat the key;
fill this array with that key, repeat it over and over. Then just for the setup you go through this
little tiny loop, 256 times and what you're doing is just shifting this array around. You will still
have one this is over all 256 possible bytes in that array; they will just be in a different order.
What you do is take what's in position i here and what's in position i here and add them with a j
which started at 0 and compute a value and then just swap two elements in this array. And just
go through a bunch of times and do a swap and get it all scrambled up. And this key caused the
swapping to be done. Once this initialization is complete, this 256 steps of initialization, then
you can start streaming out bytes and the way you do this is I started 0. You just keep
incrementing i. You keep incrementing j not by 1 but by the value in the i position and you do
another swap each time, so you just keep on swapping those values in this array and you
compute a new index which is the two values that you swapped and you output whatever is in
that position. This is the entire pseudo random number generator known as RC4. You can put
it on a T-shirt. You can put it into slides. You can put it in any way, but it's still alleged. It is
extremely fast. It's extremely nice. It's got a lot of good properties. It was discovered years
after it started being used. There's a little bit of a bias in the early bytes of the output that had
been used to break some things, so the current recommendation is just sort of throw away the
first thousand or so bytes and just start a little bit later. Stream ciphers are great. They are
typically very, very fast. You can see this thing just screams. You just do a few steps and then
you get another byte. Then you just keep on going, 10 cycles or less per byte. They are very
simple. Encryption and decryption can be identical, literally, but there are some problems. I'll
hit the first one right here. What happened there? It wasn't on my machine. You can probably
guess what that operator is supposed to be. I don't know why that -- but anyway. If you ever
use the same key stream, ever, ever you have a problem. Here's plaintext one. Here's plaintext
two, ciphertext one and ciphertext two. If you XOR these two ciphertext what happened to
that key? It's gone. The key XORd with itself. You just get these two plaintext XORd together.
And from two plaintexts XORd together you can usually tease those apart. Now you have the
two plaintexts and you can derive the key that was used and the packet and anything else. This
happens surprisingly often even amongst people who know it should happen. A few years ago I
was working with a team in Office that will not be named and using a stream cipher and saying
yes, yes. We know about all the pitfalls. We are very careful; we don't use the same key
stream. What we do to make sure of it is we derive the key stream based on the filename with
the whole path. Okay. Think about this for a minute. I'll point it out. That's great for separate
files, but what if you change a file? And I watched 12 jaws around the table simultaneously
drop. Yeah, okay. It's a mistake, but basically stream ciphers are fragile. It's really easy to mess
them up. Here's another way in which they can be messed up. Take a look at basic stream
cipher; how we designed it is bit by bit. Every single bit encrypts to a single bit. There's no
interaction across the bits. I hear people joke sometimes that this is terrible because anytime
there is a 0 here, that means that bit when unencrypted. [laughter]. Okay. But there's another
problem when you're doing it bit by bit. It's very easy for an adversary to alter the ciphertext in
a way such that it will have very specific known impact on the plaintext. You may have read
well structured text. Maybe here is some sort of a bank transfer notice. Please, transfer two
dollars and you know exactly what the form of this is. Here it comes. It's encrypted and I know
that the second byte in this value is going to have the amount. I can just take the ciphertext
and just flip one little bit and suddenly the bank transfer is an encrypted version of this. That
also leads to undesirable results, may be undesirable for Bob, no for Alice in this case. But you
can see that it can cause some problems. These are the two biggest problems with stream
ciphers. The reason that fragility exists so much with them is lack of integrity checking to make
sure that this hasn't been done and key reuse which happens to even people who think they
are being careful to avoid it. Microsoft policy requirements, part of the SDL, don't use stream
ciphers. They are so easy to mess up. Don't use them. They are nice; they have lots of good
properties and if you really, really have a good reason, you come to the crypto board and you
plead your case and we'll make sure that you are doing it okay, because we do use them in
places where they are reasonable to use, but as a rule avoid them. The alternative is block
ciphers. A block cipher basically look something like you take a key, you take some data and
you churn for a while and outcomes ciphertext. Let's take a look at what this might look like.
There are a bunch of ways of building these things, but I'm going to show you a really clever
way of building block ciphers that works very well. Lots of block ciphers have been done this
way. It was invented a little over 50 years ago by a guy named Horace Feistel. The basic idea is
you want to make something that's really ugly function that takes bits and scrambles them all
up and is really hard to figure out what is going on. That's what encryption is in some sense.
But it has to be reversible. Reversibility is kind of hard, but if you want to encrypt, you've got to
be able to decrypt. How do you make this really, really ugly and nasty and difficult but also
make it reversible? Here's this really clever trick. You take your input up here and you break it
into two halves, the left half and the right half. The right half goes through the ugliest function
you could imagine and it doesn't have to be reversible. What you do with it is take that XOR
with the left half comes through and that becomes the right half. The old right half just goes
through unchanged here. The nice thing about this is that even though you've gone through
and under reversible function here, you can reverse this. You can get from the output back to
the input by just going up and doing the same thing because you still preserve the input into
here. You've still got it so you can reproduce it and do the XOR and get back to where you
started. That works pretty well. Except, wait a minute, we left this half unchanged. Okay.
Sure, but we can do it again. Coming from here now we will do another. Now both halves have
been messed up and you actually can do both of these things essentially in parallel because you
can see this is coming down to here. You have to wait a little bit, but you can compress the
effort, but now both halves have been done. And things go through much more easily and you
have done some encryption. This is good. It's not really good enough. Usually when you do
these things you keep on churning. One thing you do, put in the key here. Whatever key you
had, the same key doesn't have to be used in both of these functions. You can mix it up a little
bit, but also you iterate, so typically one of these Feistel ciphers are iterated for at least 10
rounds, 16 rounds, often more. You use different sub keys for each round and with this, even if
your ugly function in the middle is very weak, you can get a very strong cipher by just iterating
enough. I would never recommend that you roll your own cipher. If you had to, don't do it.
But if you had to, if you take some really lame little ugly function here and iterate it 100 times
through a Feistel network like this, you'll probably have a pretty good cipher. It'll be slow. The
trick is making it so you don't have to iterate it for 100 rounds, which can reduce the number of
rounds and make it fast and efficient and still very strong. If you iterate it for 100 rounds you
will probably get something that is very good. I'll just take a moment to show you an old style
cipher that uses this. It was very common in the old days, not so much anymore, but the data
encryption standard was the first widely used public block cipher. The main reason, the main
problem is, besides being a little slow is that 56 bit key. Remember we talked about being able
to search through 2 to the 64 size space. This is smaller than that. You can do exhaustive
search on this. Just to get a sense of what can be done and how a real cipher can be built, also
the 64-bit plaintext, by the way, is a little bit short for larger block sizes now. A Feistel cipher is
done here, 16 rounds. Let's take a look inside one of these rounds for DES. All that's going on is
one of these Feistel rounds and the only thing you have to describe is the ugly function in the
middle, and in the case of DES it looks pretty much like this. I'm oversimplifying a bit, but you
take 32-bits in because it was 64 bits. The right half is now 32 bits. You take a piece of the key,
32 bits there, XOR them together, take this, break it into 8 4-bit blocks. You have a substitution
table for each block that says if you see these four bits, you substitute those four bits and then
you do a permutation and then you have used 16 rounds of this. Now the actual DES is a little
bit more complicated but not a lot. What's done is this 32 bits is expanded to 48 bits by every
block taking a neighboring bit off of the neighboring block and just absorbing it. Than the
substitution tables go from six bit values to four bit values and the subkey for each round is 48
of the 56 bits, just taking some subset of the 56 bits each time. That's all of DES. That's a full
symmetric encryption function, a full block cipher. Let's get back to where we started. We
know how to encrypt now. Great. We can go back and solve this problem. We get down to the
point where we want to buy something. You get the question of what's your credit card
number, and you say here it is. I can encrypt my credit card number because I know how to
encrypt. I got a key. I can encrypt. Except what likely happens is I want to make a purchase
and Amazon might say okay. Encrypt your credit card number with our shared key because this
is how we're doing this, right? And if this is the first time you have gone there and say what's
that shared key again? And we could have an infrastructure that says okay. First time you go to
Amazon you say I want to set up an account and via snail mail three days later comes an
envelope with the key in it. You type this 128-bit key into your browser and now you can buy
your book. If we had no alternatives I guess this would be the best business model we had but
hopefully we can do better. To do better, we need to fall on asymmetric encryption.
Asymmetric encryption deals with the case of a user and a merchant which have no prior
relationship and allows someone to encrypt a message without knowledge of decryption key.
Somebody in the back of the room right now who I have not met, I could give you instructions
and say okay. I want to send you a secret message. I tell you what to do, do this, this and this,
tell me what you got. Pick some random numbers, do the following things and tell me the
answer you got is. I will use that answer to encrypt my message to you and only you will
understand. That's what we can do with asymmetric encryption. I'm going to show you how.
Again, we'll talk about it in more depth later, but most of asymmetric cryptography, whether
it's RSA or Diffie Hellman or elliptic curve cryptography, uses this equation in some form. When
we deal in elliptic curves, we usually write it differently, but I prefer writing it this way even for
elliptic curves, making the group operation multiplication and it still looks this way. Basically,
this is an equation in some form or another that we need to work with and solve. Let's look at
this equation a little bit more. When the unknown is over here, you are just computing this
modular exponentiation and it turns out it can be computed efficiently. I'll spend a few
minutes on that. Let's look at some other cases. When this is the unknown, this is known as
the discrete log problem. If we eliminated the mod n for a while, then you are just computing
the logarithm of z to the base y. We can do that. If there's no mod here that's easy. There are
lots of ways of doing it, but you could always do it just sort of by a binary search and
interpolation and get closer and closer. What the mod n does here is it makes things chaotic
because taking things mod n -- just to make sure everybody in the room knows, mod n is just
take the remainder when divided by n. It's basically just dropping off everything but the low
order this, or the low order portion of anything. Think about what's happening there. We're
just keeping the low end, so think about like you have a timer that's running really quickly and
you see the low order stuff going wild and the high order stuff is slowly implementing, but the
low order stuff is kind of chaotic. And if we're just taking that and we're trying to find values
that cause something to happen at the low order, we're trying to do a binary search, which is
not going to work because if we change this just a little bit there are small changes up at the
higher end order stuff, but the low order stuff gets you something completely different. Binary
search just doesn't work, and other methods just don't work. We've got this thing that is
believed to be hard. We don't have a proof that it's hard, but it seems to be hard. What about
some other cases? If y is the unknown, this is also believed to be hard to solve, except if you
happen to know the factorization of n it turns out root finding, which again is easy if it's not
mod n. It's just finding the x root of z, but if you know how to factor n then it turns out you can
compute root’s mod n. I'm not going to tell you how today. I'll tell you how in a month or so if
you come back to that session. It turns out it's not that hard. The final one just for
completeness, I like to look at this. What happens if n is unknown? There's really not a lot of
work there. I've heard of a couple people looking at that. I think it's an interesting problem.
How hard is this? I don't really know. It might be worth investigating. Might be able to do
something with it. Let's look first at the how do we compute this, y to the x mod n. You might
just compute y to the x mod n by taking y to the power x and reducing it mod n. The problem is
we like to use for integer operations, integers of at least 2048 bits. That's another group the
word SDL requirement, RSA, similar things, 2048-bit minimum. If you have 2048 bit integers, x,
y and n are 2048 bits, you raise y to the x power and to get something of about that many bits.
Just for comparison, that's how many particles there are in the universe. Storing this number is
going to be a problem. [laughter]. We can't come close to storing that value, so we're going to
have to find another way. Fortunately, we can be a little bit more clever. We can do this mod n
reduction with each multiply. We don't ever let the value get very big. We can multiply and
then reduce, multiply and reduce and keep the value from ever getting big. We can solve the
storage problem that way, but to do this exponentiation by just doing 2 to the 2048 multiplies,
we have to do a lot of them pretty quickly. That's about how many we would have to do every
second just to finish before the sun burns out. A good processor now, four gig, 2 to the 32
computations per second, so we're getting there, just a little slow, okay. We want to be able to
do our encryption a little faster than this. Here's the trick. Think about how you learned to do
multiplication in elementary school. Multiplication is just repeated addition, nothing more to it
than that, but you don't multiply 23 x 49 by adding up 49 copies of 23. There are tricks.
Basically what you could do is repeated doubling. We work base 10, so we don't do it by
doubling; we do it they stand, multiples of 10. It's actually a little bit harder to learn to do it
that way. Think of if we were doing everything base 2. To compute x times y what we can do is
just compute y -- it's not obvious why I've highlighted a few of these yet, but it will be. Y, 2y, 4y,
8y, 16y and some up whichever of these we need to get x times y. Sorry, I should have mathed
this initially, but here's the example. Suppose you want to do 26 y. I really should have this one
tilted in a way. I see people are having trouble seeing the things low, sorry. I don't know that
there is anything I can do about this at this point. I'll just try to lead and make it clear. Suppose
you want to compute 26 y. If we write it out in binary, write 26 out in binary, you find that you
need 16 y +8 y +2 y. And I don't know why I made that 2 green, sorry. Instead of doing 26
additions, or maybe 25 additions, you're doing four doublings and two additions and the
doublings are additions; they are just easy additions. We save a lot of work. Exactly the same
thing can be done with exponentiation. Suppose we want y to the x or we just repeatedly
square value and all these things are done mod n, by the way, because we know we want to
reduce it mod n, keep a small. Don't use of the universe. So we do these squarings and then
we just multiplied together the things we need to get y to the 26 power, so we're not doing 25
multiplies. We're doing four multiplies here, four easy multiplies here and two more multiplies,
much faster. If we count it up, now if we were trying to do, multiply 2000 bit, you do and
exponentiation of 2000 bit values, then it takes about 3000 multiplications, 3000 ordinary
multiplications to do it. Basically, 2000 squarings to get all of the things that we need and
about 1000 side multiplications to multiply roughly half of these will have to be multiplied
together, depending on the binary representation of this x up here. Make sense? Questions?
Okay, good. Great. We can exponentiation quickly. Now let's do RSA. Let's show how we can
use root findings to solve the problems that we have here of asymmetric encryption. This is
going to be RSA in two slides and it really actually, most of its in one slide but two slides get
everything you want to know. I tell you you want to create an RSA key? Take too big primes,
multiply them together. Published the product. Tell everybody the product. That's public now.
We typically use the x-value, the exponent, 65537. It's two to the 16th +1. It has a very nice
binary representation. A 1, a whole bunch of zeros and a 1, right? So we have very few of
those side multiplies, a nice little convenience. It doesn't have to be a huge exponent for the
encryption. We tell you to encrypt your message y by just taking y to the power x, mod your
chosen n and since you know the factorization of n, you can solve this discrete root problem
and invert it. You know the factorization of n; you can do this if you don't know the
factorization of n, it is a secret. It is opaque to you. That is RSA. Because if you take the x
power and then the x route, you get back to where you started. That's all it is. In fact, RSA has
another really nice property and this is slide 2 of RSA, signatures. We know that you can
encrypt and then decrypt the encryption and get back to where you started, but there's also
this odd little feature that you could decrypt first and then encrypt, take the x route and then
the x power and you will also get back to where you started. Looks bizarre. Why would
anybody want to do that? This is something that nobody else can do, so taking the x root of y
effectively serves as a signature on y. It's something that only I could produce, so I have signed
y by producing a value which is the x root of y and anybody can verify that signature by raising it
to the xth power mod my n and they get y back. That's a digital signature. Great. We got
encryption and digital signatures all in one simple thing. Back to our confidential data transfer,
our purchase from Amazon. Now we are in much better shape. I want to make a purchase,
really, I do. I'm trying. Amazon comes along and says here's my key, now encrypt with that. I
send my credit card over that channel. Looks good. Everybody happy? No. Okay. Why not?
Anybody want to help? I'm sorry. Try again. Are you sure it's their key, I thought I heard. Yep.
That is a big problem. You might have an intermediary hanging out there. It's not just you and
Amazon, but there may be somebody in the middle. You start browsing through the Amazon
collection and the intermediary intercepts everything and forwards it on and gets the responses
and forwards them back-and-forth and eventually, you get to a point where you want to make
a purchase. The intermediary happily says, sure, I want to make a purchase, and Amazon says
here's my public key, and the intermediary says and here is my public key, but it's different. It's
a tilde, a crooked E. Uh-huh. They send back a crooked encryption function and you encrypt
your credit card number very dutifully to the intermediary and now the intermediary has your
credit card number. And if the intermediary is clever, it will then take your credit card number
and re-encrypt it for Amazon so that you get your book or blunder or whatever it is you were
trying to buy and you are none the wiser and Amazon is not the wiser, but your credit card
number is now in somebody else's hands. This isn't good. So we add a feature, a digital
certificate. This is, basically, a statement that somebody has signed saying what Amazon's
public key really is. If we have a certificate authority that can do this signature and we believe
that certificate authority, then we know what Amazon's real key is. The problem is now we just
have bootstrap the problem or put it back down to how do we believe the certifying authority.
Baked into our browsers are the root keys of various certifying authorities, so if they're good
and they do their job well, then we can trust them and everything is good. Of course, there are
a lot of certifying authorities in the browser, maybe some more trustworthy than others, maybe
some more careful than others. If you look in your browser you can still find some old keys that
have been explicitly disallowed because they were issued by the certifying authority VeriSign to
Microsoft because somebody called them up and said I'm Microsoft, please sign my key. Please
certify it and they did and nobody at Microsoft seems to know who that person was. Those
keys never seem to have shown up, but you have to be careful. Okay, so we've got a cert. Can
we do this now? I want to make a purchase. Amazon sends back a public key and a certificate
saying this really is Amazon’s public key and now I encrypt. Okay? Are we good now? We're
better, but you're wise. You know not to trust a cryptographer.
>>: Replay attack.
>> Josh Benaloh: Replay attack, perfect, thank you. I'll pay you later. [laughter]. Just give me
your credit card and I'll be fine. This seems to work just fine. Here's a perfectly good exchange
and somebody overheard this perfectly good exchange. And then later on whoever that
eavesdropper was to overheard this goes back to Amazon and says I want to buy some really
expensive, something really nice. And Amazon says, okay. Here's my public key and the
certificate. And all I have to do is say sure. My credit card number is that thing, whatever that
thing that went over the wire last time, yeah, it's the same one. And Amazon dutifully decrypts
it and suddenly somebody else gets charged. So we're going to add one more component to
this. We're going to add what's called a nonce, which is a coined word from number once. A
nonce is just a random value that goes in and it changes every time and you send the
encryption of the credit card together with a nonce and the next time Amazon sends it to
someone else they will send a different nonce and if the same nonce doesn't come back each
time than Amazon will know this is wrong, disallow it. Are we done? We're done to the extent
that this is what is done today. I'll show you how. I'll claim it's not good enough, but let's take
a look at how this is used today. There is a sordid history of SSL, which probably most people
here have heard of and TLS, which probably most people here have heard of, and PCT which is a
Microsoft standard which probably very few people here have heard of. The history, roughly, is
SSL came out in 1994. It was produced by Netscape. Now, 1994, those of you here who are
techno history buffs might recognize as this was the time of the big browser wars. Actually, this
was a little bit before they started, but this was still the first salvo, essentially, and Netscape
had this nice secure protocol and we weren't going to use, well we kind of had to, compatibility,
but we didn't want to be bound to Netscape, so we came out with our own protocol called
private communication technology. Part of the excuse for coming out with it was we, and when
I say we in this case I mean especially Dan Simon who now works in Windows Phone, I think,
found a big bug in SSL2 and we said we can fix that bug. We've got something better and we've
got these nice new features and people should be using PCT and PCT was put in browsers and it
was default and it was even used occasionally [laughter]. Next year Netscape came out with an
improved SSL3, which fixed the bug and was very common. Lots of people used it. It was good
and it got some new features. We are still competing with them. We have to come out with
our next version, but how are we going to, even though we put in lots of enhancements in the
new version, beat SSL3 with our new version? The trick is we come out with version 4.
[laughter]. And we put out version 4 which had some nice enhancements and 4 is bigger than 3
so it must be better. And it was enough eventually to bring Netscape to the table in an IETF
meeting and we eventually all got together and agreed on TLS which, version 1 existed for a
long time. There is now a slightly newer, better version, but still it's a very, very nice protocol.
It has a lot of good features. Basically, the way it works, the thing that has made this survive for
so long is the handshake. The trick of the handshake is you've got a client and a server and the
client starts off with okay. I want to set up a secure communication channel. Here are the
protocols I understand. Here are the ciphers I understand. Here are the tools. Here are the
languages I speak in some sense. The merchant can go through the list that the client provides.
The server goes through and says okay. This one is good enough. I'll take that one. The server
gets to choose. Let's use this protocol. Let's use RSA or let's use UCC or let's use Diffie Hellman
and let's use this hash function and this symmetric cipher. I don't like DES anymore; I'll use AES,
whatever. They can choose. And here is the public key and the certificate and the nonce and
then okay. The public key is used to transmit a symmetric key and then all the subsequent
communication is with the symmetric key once you've done this exchange. The really nice
thing about this is it's very agile. We don't have to decide that next Thursday the internet is
going to change ciphers. Everybody has to shut their old cipher down and start using a new
cipher Thursday. It's very smooth. Servers can start accepting new ciphers. Browsers, clients
can start implementing new ciphers. If they're not yet understood by the servers, that's okay.
Eventually, old ones the servers will stop accepting. It's a smooth, gradual process. It's very
agile; it's very nice. It's a good way of doing this negotiation and this has allowed this protocol
to survive as new primitives, long and old ones lose favor. It works quite well. Once the
negotiation is done, you get into the SSL/PCT/TLS record layer which basically says from then on
you just use the symmetric cipher with the key that you sent over and you integrity check it
with a hash, which I'll talk more about in another session and you wind up with what is
sometimes known as hybrid cryptography. We use asymmetric crypto and its nice features for
things that we can't do like sending a symmetric key to somebody that I've never talked to
before. But the symmetric crypto is much, much much more efficient, so once we've done
those things which we can't do without asymmetric stuff, then we do pretty much everything
else symmetrically. And we have this hybrid of asymmetric wrapping the symmetric keys were
doing the digital signatures and the bulk stuff is done with the much faster symmetric stuff.
One thing I haven't talked about much and this is going to be, we're getting close to the end,
which is good because I'm getting close to the end of my slides, is authentication and
authorization. This is something which has been done woefully badly in the protocol that I just
showed you. The reason is when a merchant is on the other end of this conversation, gets my
credit card number, does the merchant really know who this merchant is talking to, that I'm the
one on the other end? No. Not with basic SSL/TLS. There is something called client auth that
can be used, but really never is. All they have is here's my credit card number. Okay. I will
charge this person. I will assume that this credit card number is good and go from there. But I
am my credit card number effectively. That's really not a really good way of doing things. Does
the merchant know that I authorized the purchase? No. The merchant just sees my credit card
number coming over the wire. If possession of a credit card number is used as a proxy for
identification, then I can be impersonated. My claim is this is the reason for things like the
Target reach. People will talk about well, Target had this failing of security, this failing of
security. Not. The problem is right here. The problem is Target has data which allows an
attacker to grab and impersonate millions of people because all you need to impersonate
somebody is their credit card number. If we got rid of that there would be other sensitive data,
what purchases I've made in the last year, but people aren't going to go through too much
trouble to steal that stuff, at least not as much trouble. The real problem is we built this insane
infrastructure in which a credit card is everything and if you have my credit card number you
can impersonate me, or other things. Maybe my credit card number and by Social Security
number or my date of birth. I can find for probably most people in this room, find your date of
birth in less than 2 minutes and your driver’s license number. This is all public data, easily
discoverable. It doesn't cost anything. Ask me at the end and if you want I'll show you how to
do it. It used to be that your election records in this state, you could change your address with
your date of birth and driver's license number. It turns out the driver’s license number in this
state is deterministically computed from your date of birth, which is public. That's been fixed a
little bit. It's a little bit better now, but not a lot better. I had a discussion with the director of
elections a couple of years ago and we agreed. They have now added driver’s license issue
date, which is far most people within two months of the expiration, which is on a birthday that
is a multiple of five years. There's about six bits of entropy in there, better than nothing. Also,
if my giving a merchant a credit card number is a proxy for an authorization, then merchants
can cheat all they want. They can say you told me you wanted to buy this expensive thing. You
sent me your credit card number, right, so you must have. It's not really authorization. There
are ways that we can do better. We could use private authenticators like what people usually
do with a bank. My bank and I somehow agree on a password and this password is supposed to
not be shared with anybody else, supposed to be. Okay. That can work reasonably well for
authentication. My bank can't impersonate me to somebody else if nobody else recognizes
that password as having any meaning. It doesn't have any authorization value. The fact that I
gave my bank a password and gave it some instructions and it shows this transcript of all these
things that happened, doesn't mean that I was really on the other end. I can say you had my
password to begin with. You could have done that all yourself. How do you know I was
involved in that? And, of course, we all know the usability is horrible with passwords. People
can't remember passwords. People are supposed to have individual passwords with many,
many different entities, so they wind up having the same password and it just doesn't work
well. Biometrics is a possibility. This has some value for local authentication. It's really bad for
remote authentication and yet people like to use it for that. You see my fingerprint coming
down a wire and you think I must be on the other end as though it's some sort of a secret? I
leave my fingerprints everywhere I go. It's not a secret. It's a worse secret than just about
anything else you can imagine. But somehow seeing my fingerprint means that I'm there. I
don't know. That's really not a very good choice and, of course, biometrics are very difficult to
revoke. I'd like to keep my fingers, thank you, so treating biometrics as some sort of secret data
is often done and it's a terrible way to do things. It can have value for a local authentication.
What we would like to do is the digital signatures. I think we've got like two slides. Digital
signatures offer a good authentication mechanism. You can do a lot with them, but you need a
public-key infrastructure of some sort to do this. Amazon has gotten a digital certificate on its
key. How many people in this room have a public key that has been certified by some certifying
authority? Okay. A few, but not many. Until we find a way of getting there, we need the killer
app that will cause people to get certified. Some governments are issuing cards to citizens and
if you have this and a good way to do this kind of real authentication, then it's great, but until
we get there we're stuck with biometrics and passwords and broken systems. Another thing to
mention is digital signatures you can do as a human being. You need computational systems,
but they can be used as authorization if you do it right. It's more than two slides but these are
quick slides. Sorry. Within SSL/PCT/TLS you can imagine you want to make a purchase. You get
something that says please sign this authorization and you go and say okay. Here's my
signature. Here's my certificate. There are still problems with this. It's not quite this simple.
You have replay attacks. That signature could be reused if you don't make sure that it contains
data that won't allow it so it should have nonces or something of that sort. There's also
malleability issues. Malleability is signatures can be transformed in various ways. Or remember
with RSA, signing something is exactly the same as decrypting something. So somebody comes
along and says please sign this authorization and what they are really saying is please decrypt
this for me. We have to build into the protocols ways of making sure that you never just signed
something that you're given, that the signer also has to put something into the thing being
signed of the signer’s choosing so it's not deterministic. And there are the human limitations of
dealing with just how do you do the computation. People can't digitally signed things without
computational assistance. The final thing here is we want to get some sort of hybrid
authorization where we authorize a device, our own computing device to authorize on our
behalf, but how do we do that? It changes the whole notion. Normally we talk about the
protocol, but instead, we should be talking about what a former Microsoft college used to refer
to as a ceremony. It's not just you and the merchant involved in the protocol. It's you and your
device and your merchant. There's an interaction between you and your device and there's an
interaction between your device and the merchant, whether your device is a smartcard or it's
your home PC or it's your phone or whatever it is. The protocol is going all through here and
there are all these places where there could be a break where this could be a perfect protocol
but the interaction between you and your device can be taken advantage of. It should be that
when you get to the point that you tell your device that you want to make a purchase and that
gets passed on, there's an authorization step here. There's an authorization step here and you
can enter something that you can do locally to your device, something reasonably simple. Your
device, then, does the digital authorization, but this all has to be taken into account as a whole,
not broken up piecemeal. That's kind of an overview of the kinds of things that I'll be talking
about. Subsequent sessions -- just when did there, but that's okay. Subsequent sessions we'll
be talking about symmetric protocols, asymmetric protocols, the sort of standard integer ones,
non-integer ones, especially elliptic curves, lattice space systems, various protocol properties,
forward secrecy and then some applications and maybe if people allow me I'll sneak in some
election protocols at the very end. But if people want to hear about other things, if people are
less interested in some of these things, there are other things, let me know. I want to keep this
flexible and tried to find a way to sort of fit whatever people want to hit on. By the way, all of
the other sessions, 1927, I apologize for the confusion. I think that must've been my mistake.
My office is building 112. My admin’s office is 112. Somehow that slipped in. But we are in 99,
but the future sessions will be in 1927. It's a smaller room that way, just around the corner.
Okay. That's it. Any questions? [applause]. Seny?
>>: You showed RSA? [indiscernible] version of RSA? Are you going to talk about some
security at all?
>> Josh Benaloh: Yeah. Maybe not in exactly that form, but yeah. The RSA that I showed,
nobody in their right minds would use that in practice. You have to be very careful about
exactly how you had things in the actual use of RSA and I'll talk about that. Among other things,
to make sure that encryption is distinct from signatures and there is no possibility of confusion,
but there are a lot of things. The Symantec security that you mentioned is one important
aspect of understanding exactly the kind of security that the RSA can provide. I do hope to talk
about that as well. Yep?
>>: That last example of the ceremony during authorization, is chip and pin credit cards an
example of that?
>> Josh Benaloh: Yes. Chip and pin does that. Now it doesn't solve all the problems because
chip and pin typically the way it's used credit card number still get transferred and maintained,
so it's better, but chip and pin, which is done a lot in Europe, and there is now some
expectation that they will come to the U.S. But people don't know; basically it's not enough to
swipe a credit card, you have to have a pin also. There is a pin on the credit card and the credit
card is supposed to engage in a more sophisticated protocol than the usual one of here I am.
Here's my number. Anybody want my number? Which is what many of our cards to like our
[indiscernible] cards. Basically, you ping them. Here's my information. We want to do better
than that. We can do better than that. Chip and pin is a little bit better than what we're doing,
but it's not what we should be doing. Yeah?
>>: You did a talk about some of the tricks, algorithm in AES. Were you talking about
>> Josh Benaloh: Yes. Actually, I debated whether or not to put in cipher mode today and I
decided we are already over time so I'm glad I didn't. We'll spend plenty of time talking about
cipher modes.
>>: So we can go until three?
>> Josh Benaloh: The usual way this is done is in the hour of talk, plenty of time for questions.
We are flexible so, if people would like hour and a half talks, I can talk ad nauseam, I promise.
But the tradition, anyway, is it's on the calendar until three o'clock so that there is some
flexibility at the end. I'm not going to talk about cipher modes now. I'll talk about them next
time unless you really want something, talk to me later.
>>: I'm curious. There's a lot of government databases that hold very sensitive information
regarding individuals or medical purposes. How good are those?
>> Josh Benaloh: What's the quality of medical databases and other similar databases? It's
very mixed. Some are pretty good. Some are not nearly as good as they should be, but there
are problems as to just what's in there. Usually, there's a lot more in there then needs to be,
then should be. Medical databases are kind of an exception because that's very sensitive stuff
that really does need to be there for the most part. That's a case where we have to try to be
very careful and instead not expose them as much as they tend to be exposed. They need
some extra security. As opposed to Target which probably doesn't need the extra security.
What they store is only what you bought in the last two years. It's still kind of sensitive, but it's
a lot less sensitive than what medical tests I've had in the last two years. There isn't a good
answer to that. It's not great. We can talk about some methods that are used, but you
probably don't want to know all the details. [laughter].
>>: Actually I do.
>> Josh Benaloh: Okay. We can talk little bit. Honestly, I don't know all the details. I know
some of, but not all of it. Yeah?
>>: [indiscernible] about the exponentiation where you multiply multiple times to go to n?
>> Josh Benaloh: Okay. You know what? I'm going to try to do it this way.
>>: While Josh puts up that slide, anybody here who hasn't signed that spreadsheet, then on
your way out the door please do.
>>: It's on the back table now?
>> Josh Benaloh: Oh no. I hit the wrong key. I hit F5 instead of shift F5. This one?
>>: Yeah.
>> Josh Benaloh: Okay. You want to go a little further.
>>: Where it was simplified.
>> Josh Benaloh: So let's do exponentiation.
>>: Yes. If someone knows how many times you exponentiation and how many times you
added, is it hard to reach 2n or is it computationally [indiscernible]
>> Josh Benaloh: If it's just the number of times, then you are pretty secure, because there
aren't that many possibilities. The number of times you -- this is square and multiply here. You
are always doing the same number of squares; basically, you should always be doing the same
number of squares. The multiplies, the number of multiplies that you do should be clustered
tightly around half of the number of squares and knowing that allowed you to actually figure
out somebody's key, there aren't that many possibilities you could guess that and try it. The
problem though is that sometimes if you are close in you can actually tell the difference. The
usual approach is not due all the squares and then do a bunch of multiplies, it is while you are
doing it, you square and then multiply or square and then not multiply and if you are not
careful in your implementation, then there's a side channel attack and probably I should put on
the list side channel attacks, which has been some time talking about some of those. But
there's a side channel attack which is just sort of a timing attack or power analysis or listening
to, if you can get close enough to a device it can even be done acoustically, listening to it. You
can record it and listen and tell the difference between a multiply and a not multiply and that
way you get the bits of the key one at a time. That would be very detrimental, so we are very
careful in our implementations. One of the reasons you should use the Microsoft internal
approved implementations and not roll your own is we're very aware of side channel attacks
and try to build in as much resistance as we can. Anything else?
>>: If you go back about two slides to the slide that had the 2048 on it. This one I thought
when you said the crypto board recommends that all three be 2048 signatures. Is that what
you said? Is that what you said? One of those everybody uses 65537 and so can you explain
>> Josh Benaloh: These two need to be 2048 bit. Sorry, not these, these two.
>>: [indiscernible] 65537.
>> Josh Benaloh: It's n that's important here. Y is the message, so y will be padded separately.
It's just a message. The message means that the computations will be done on 2048 bit values.
It's just how many competitions and it turns out that with RSA as long as you're a little bit
careful about a small exponent seems to be just as good as a large exponent. You get some
interesting effects out of this and it's worth mentioning. This exponent is going to be public
anyway, so you're not hiding anything by, or revealing anything by using this special form. The
question is, does it make it easy to attack? It turns out that if you want something that is of this
special form 1 all zeros and a 1, there are very few possibilities that meet all of the
requirements. Three actually does, but there is some reason not to use three. There are some
attacks on very, very small exponents if they are used carelessly. Basically, you could get
something that doesn't get large enough to wrap mod n and if you had very badly and you just
use raw RSA, you don't wrap mod and, then it just becomes taking cubed roots. So we want
something that small, has this structure and sort of 65537 has turned out to be the perfect
value and virtually everybody uses it. It leads this interesting asymmetry in asymmetric
encryption with RSA that encryption is much, much faster than decryption, a couple of orders
of magnitude faster. Verifying a signature is much faster than signing something because the
verification is with a small exponent. The decryption which is root finding is, it's also just an
exponentiation. It just turns out to be a large exponentiation. You use a full size -- and that
case the exponent becomes 2048 bits here. You wind up encryption and decryption are using
basically the same function, but the exponent is much larger for that secret operation and
therefore, encryption is fast; decryption is slow. Often you can use that to your benefit.
Sometimes you wind up on the wrong side of it. You end up with a weak client having to do the
expensive operation and that's a nuisance. We try to formulate things when we can so that it's
the other way around, but we can't always do that.
>>: Will you be going into this in a lot more detail on your third talk?
>> Josh Benaloh: Yeah, the third talk is when it's most likely. Yep. And I do intend on showing
you how we compute these things. If you like math that's the talk to come to. Anything else?
Okay. Thank you. Be sure to sign the sheet if you want a chance at the trinket. [applause]