>> Helen Wang: Good morning, everyone. It's my great pleasure... introduce Joe Bonneau. Joe is a PhD candidate from the...

advertisement
>> Helen Wang: Good morning, everyone. It's my great pleasure to
introduce Joe Bonneau. Joe is a PhD candidate from the University of
Cambridge, and he has been working with Ross Anderson for the past
years. And today he is going to give a talk on guessing human chosen
secrets.
>> Joseph Bonneau: Great. Thank you. Right. So, just see if I have my
slides advancing. Right. So, I'll try and pre-empt the question I
normally get at the end which is why are we doing password research at
all? This is a picture in what I identified as basically the first
password deployment in 1961 at MIT. They actually wrote a retrospective
last year for the fiftieth anniversary of it, and one of the graduate
students who worked on it admitted that he was the first person to
actually compromise password security by guessing people's passwords.
What was actually the most interesting thing to me about reading it is
that the threat model was completely different then. The main reason
that they deployed passwords in the first place was to try and segment
the very limited resources of computing time. And this guy, Allen
Share, who admitted doing it said that the only reason he guessed
people's passwords was so that he could allocate more time to his jobs
and try and finish his research sooner. There's also a lot of other
really good tidbits in the retrospective. They had a race condition
where the print daemon would print whichever file was most recently
accessed, and if somebody logged in, in between the time when somebody
sent the print command and when the printer actually printed it which
was very small time window, it would actually print the password file
which wasn't encrypted then. So this happened a couple of times where
they accidentally sent the entire master password file to the printer,
and they had to reset everybody's passwords.
Anyway so that was fifty years ago, and sadly we basically are dealing
with a lot of the same problems today.
>> : [Inaudible].
>> Joseph Bonneau: I guess we have lost ties and we've lost some of the
hairstyles that you can see here, but basically passwords have stayed
with us. So this is a small snapshot of some work I've done in the past
year with Cormac and two others looking at basically why has everything
that has been proposed to replace passwords failed? So we tried to
really zoom back and say what are passwords actually good for? So we
made a massive list -- Oops, not a laser pointer. We made a big list of
every property that we would like an authentication system to have, and
then we tried to score passwords objectively and say passwords do have
some nice properties. You don't have to carry anything. At this point
they're easy to learn. Everybody knows how to use them. We know how to
reset them. They're compatible with everything in the world. They're
basically free to deploy, and there's nothing to be stolen. You can do
them without trusted third-parties, so they have some security
properties that are okay.
So we then we tried to look at the best examples of everything that's
been proposed with the claim that, "This can replace passwords and
solve all of our problems." And when we scored those systems using the
same evaluation criteria, we found universally that every replacement
makes a couple of incremental improvements. So this is the Firefox
Password Manager. It makes things more scalable for users, say,
eliminates some errors due to typing passwords, but it's now not as
easy to recover. If you lose your password, it's not compatible with
the browsers, and it sort of no longer has this nothing-to-carry
universality of passwords.
So we said, okay, there's a few marginal gains and a few marginal
losses from switching to something like that. And we repeated the
exercise for lots of other things: graphical passwords, doing
authentication by sending SMS messages to a phone, trying some
biometric like iris recognition, doing some remote single sign on
scheme like OpenID. And it was always the same story, that you can't
replace passwords in a strictly win-win situation; you have to give up
some of the properties that we've become accustomed to with passwords.
Which all this goes to say, a little bit, that passwords are basically
the only show in town right now for the immediate future. We don't know
really how to replace them in a way that will keep everybody happy. And
particularly since somebody has to lose or something has to become
harder for somebody if we replace passwords, it's very hard to break
the status quo. So I do think that passwords in their current role will
be with us conservatively at least for five years; I think it would be
very, very hard to imagine passwords disappearing on the web.
So if you're interested in reading more and seeing the entire table
which doesn't fit on one slide, you can check out this paper which with
Cormac and Frank Stajano and Paul van Oorschot, which will be at
Oakland in a couple weeks.
And longer term, taking another step back, the current situation we
have, say, N users and M servers that some users might want to talk to,
and every single connection that gets made users have to register
another password. So we have this big messy bipartite graph. And this
causes all sorts of problems: users end up reusing passwords; there's,
compromises at one server can affect another server; there's too many
passwords being demanded by users. All the familiar laundry list of
complaints about passwords.
So what would be really great is if we could switch to having one
intermediary in the middle. This is the classic computer science trick
of add a layer of indirection. Now everything looks really good. Every
user just has to memorize one thing, and every server just has to
maintain a connection with this one trusted intermediary. Of course
there's a couple problems with this. This is basically the Microsoft
Passport proposal from about ten years. It's also Kerberos. Today,
reincarnated it's Facebook Connect. And of course there's problems that
some users aren't going to want to trust this middle server, that
server becomes a point of failure and all these things. So in the best
case maybe five, ten years down the road we'll end up in some situation
like this where users have their choice. They might use some different
remote server of their choosing. They might trust their mobile phone to
intermediate everything for them, or they might use some weird gadget
that we haven't invented yet.
And there still is going to be a really messy N-to-M bipartite graph
down here, but we more or less can solve that problem. The protocols
that underlie things like OpenID, which would enable the entire world
of web servers to rely on any number of different identity providers,
that's relatively a solvable problem. But we still have the problem of
authenticating users just to the one trusted device so whether it's
their OpenID provider or their phone or basically anything else. I've
yet to hear a really credible proposal that doesn't push this to having
one last password that people remember.
Even in cases of trusted hardware and even if there's some biometric
capability, there are still problems of devices getting stolen and
biometrics getting faked. So people say, "Well, we should add a
password to the system to take care of that one situation." So in a way
long term that's the thing I'm most interested in is how we solve this
last mile of authentication from the user to one device which then
hopefully can unlock the world for them in a sane way. So this gets rid
of, hopefully, a lot of the current generation of problems that we have
with malware and phishing and people reusing passwords across sites.
But we still have the fundamental problem that people have to memorize
something that they then spit back to some trusted device that it's
possible for adversaries to guess.
So most of the bulk of my talk will be about this problem, abstractly,
of how hard it is to guess secrets that people have remembered. I came
into this a couple years ago, and it's turned into the topic of my PhD
work. So these are a couple of questions that hopefully at the end of
this talk I will have convinced you of an answer to that I don't think
there is a good answer or even really a sound way of answering this
question when I got into this line of research.
So for fun we can do a show of hands. If we're talking just about
passwords, how many people think it's easier to guess passwords chosen
by teenager's compared the elderly? So teenager's first and the
elderly. Okay. So pretty split room. We'll have the answer later. And
how about a four-digit PIN that a human has chosen versus a random
three-digit PIN? Who thinks the human-chosen four-digit PIN is better?
>> : Are you talking about the weakest or the average?
>> Joseph Bonneau: I'm leaving it vague for now.
>> : [Inaudible]...
>> Joseph Bonneau: Well, you can think if you were an adversary and you
were doing some guessing attack, which situation would you rather be
in? So everybody likes the random three-digit PIN?
>> : [Inaudible] I'll take [inaudible].
>> Joseph Bonneau: You'll take the human-chosen?
>> : Yeah.
>> Joseph Bonneau: Okay. So one brave person. And how about a password
versus a random surname chosen from the population? So unknown person
and you have to guess their last name, who thinks that's harder than a
password? One person. Most everybody else thinks the password's better.
>> : It depends here on the culture, right? I mean, certain
>> Joseph Bonneau: Right.
>> : cultures are easier to guess surnames
>> Joseph Bonneau: Definitely. So, yeah, actually in my thesis there's
a whole table on different cultures, and there's some real outliers.
Like in South Korea there's three surnames that cover more than half of
the people. So if you're dealing with Korean people, the surname is,
yeah, quite easy to guess. So if you want I can add information and say
we'll take the population of users on Facebook who come from around the
world, a lot of Americans. But still everybody likes the password or we
had one or two for the...
>> : I pick the surnames.
>> Joseph Bonneau: Okay. We'll we're not keeping score but we'll find
the -- we'll get to that at the end of the talk. All right, so a little
bit of historical background on how people have tackled this problem
before. There's the approach of looking at the content of passwords and
looking at things like, well, how long are they? What different types
of characters do they have? Do they have things outside of letters and
digits? And a lot of people have done surveys of this by hand from
passwords chosen in user studies in the lab. And there's a guideline
that was put out by NIST in 2005, I think, which they never really sold
as a bulletproof rule. This was kind of just a rough rule of thumb, but
it's been used in a lot of usability papers since then. So it's become
important at least in the research context which tries to answer the
question of, "If you have a password of a certain length and with
certain types of characters, how strong is this?" And on the right side
over here this is if these white cells are truly random, so if within
the space of one character from a 94-character alphabet this is how
many bits you get. And then these are the estimates for human-chosen
under a couple of different restrictions. So if they can only choose
numbers, if there's a dictionary check in place, if they're required to
use non-alphabetic characters, stuff like that. So not very exciting,
just a big table without a lot of motivation behind it.
The other approach to evaluating passwords is just to run some cracking
library against them, and there's tons of cracking libraries out there
now. John the Ripper is the most popular in research studies because
it's free; you can get it on the web. There's also a whole ecosystem
that exists of proprietary password cracking software which you can
spent 14,000 pounds on if you're interested, that's at the discounted
price, which I've never really looked at. But a lot of this is out
there. What's interesting I think is that they have basically two
business cases for selling this stuff which are employers trying to
recover the passwords of their own employees and sort of jaded spouses
or parents who are really interested in one person who they share a
computer with. So a very strange world, but it does actually show up in
-- Sadly it shows up in research papers sometimes that they say, "We
used a certain proprietary scheme," which then makes the entire result
kind of useless because you have no idea how good or bad this scheme
was.
So here is a graph of some faster cracking results that have been
published over the years. So down here this is proportion of passwords
that were broken in the cracking attack and this is the size of the
dictionary plotted logarithmically. So that's actually masking a pretty
massive loss of efficiency as you go here because this is basically a
linear scale, and this is -- you're doing logarithmically more work. So
if this is a linear axis too it would obviously be a huge, huge ramp up
in the amount of work you have to do. But the interesting bits here I
think are that these cracking attacks they very rarely ever get past
the border of breaking about half of accounts that the researchers had
access to.
The one exception is this result from 1979 which was with users of like
the original UNIX days by Rob Morris and Ken Thompson who actually to
85%. Well, that was in the days of crypt and pretty simple passwords.
Okay, so in comparing these approaches and thinking about what we would
like in a way to evaluate passwords, I would say that the semantic
evaluation formulas, like the NIST formula, are lacking in validity a
little bit because it's not really clear that the number that you get
out of that formula corresponds to how hard passwords are to guess in
real life. Whereas the cracking evaluations are pretty good here
because the attackers use probably roughly the same software that
researchers can get their hands on.
The problem with cracking attacks is that they have -- they're biased
toward how skillful the researchers are in using them because these
things are extremely difficult to set up. You have a lot of choices to
make, which word list you use, which mangling rules you use, and
there's also some demographic bias because if the passwords that you're
studying were chosen by a different population than the word list that
you feed into your cracking algorithm, you'll do a lot worse. And it
doesn't necessarily mean that the passwords are weaker. So if you run
the stock John the Ripper password cracking against passwords chosen by
users from China, they might look like the passwords are much stronger
than users from an English-speaking country. But, it's not clear if
that's actually true or if it's just a worse fit for the cracking
software.
And the semantic evaluations I think at least there's no bias for how
skilled the operator is because it's a deterministic formula, but the
demographic problem creeps in once again. And the further problems with
the cracking evaluation in terms of good science and doing long term
longitudinal comparison of results, it's very hard to repeat old
password cracking experiments because the cracking software gets
updated all of the time. And people rarely report enough information
about exactly how they compiled the software, which word list they
used, which mangling rules they use.
And it's not free to run the cracking tools either. Some of the
research that's been done in the past year or two at Carnegie Melon
where they've done a lot of cracking against passwords that they've
collected and user studies, they've actually had to devote a pretty
significant amount of computational resources to doing the cracking
evaluations which is doable but it's not free like doing the NIST
entropy formula is.
So my goals going into this space were to try and fix some of these
problems with basically the secret weapon of having a lot of data. So
in the past couple of years there've been a couple of big leaks of
password data sets from different websites on the web. The biggest one
was from RockYou in 2009 which had thirty-two million users, and the
passwords were leaked in plain text. So this was kind of Christmas
morning for people who are interested in doing password research
because you could look exactly and say exactly how many people chose
"123456" and on down the line. So this data set has shown up in dozens
of papers over the years. I've used it quite a bit, and it is really
very useful for doing these things. So we'll come back to the RockYou
data in a second.
My goal for the rest of this talk will be to say if we have a big data
set like the RockYou set, can we develop some purely statistical
metrics that don't assume anything about what the passwords are or what
they mean; they just look at this column on the left here which says,
"The most common word was chosen this many times and the second most
common password this many times," and so on? So this is the histogram
of popularity of passwords. Can we develop some metrics that rely only
on those statistics that will be useful to us? And if it's possible to
do this then we can collect data in a privacy-preserving way which is
nice which will let us collect a really, really big data set.
All right. So I'll briefly descend into some more theoretical framing
of the problem which is that we'll assume passwords are just a
probability distribution that we know completely. So when a user
chooses a password they're just doing a random draw from some
distribution, and we know exactly what all of the probabilities are.
And in this model we can develop some metrics for a given distribution,
how hard is it for an attacker in different circumstances? So the first
port of call is to compute the entropy of this distribution. So this
has been around for decades and it's been -- Excuse me. So entropy can
be used to measure the uncertainty that we have about some unknown
value drawn from a distribution, but it doesn't actually have to do
with sequential guessing in the sense that we're interested where the
attacker has to guess, "Is the value this? Is the value that?"
What entropy would measure is if an attacker could say, "Does the value
come from this set? No. Does it come from this small set?" and so on.
Entropy actually tells you exactly how many guesses you need on average
to win that guessing game where you can guess whole sets. Which is
basically if you ever played the board game "Guess Who," you may have
figured out the optimal strategy of course it just to do a binary
search. And you can, of course, the distribution's not even then you
develop a Huffman code and you search over the bits one at a time that
way. Unfortunately in the case of passwords or basically anything else
chosen by humans you very rarely get to guess whole sets at a time. You
have to guess one element.
So an alternate metric that has been proposed is called either
guesswork or guessing entropy. And this is really simple. It just says
if you guess the passwords in order of probability, what's the expected
number of guesses that you'll need before you succeed? So just a simple
summation here. So this looks pretty good; this looks like what we
want. The problem is that when you dig into the RockYou data set, for
example, you find a lot of stuff in the tail of it which certainly
looks like random 128-bit hex strings. So about one in a million people
within the RockYou data set chose something that looks like it was
drawn randomly from a 128-bit base. Okay? So the implications of that
through a relatively easy to prove Lemma that I'll skip over here: if
our distribution is a mixture distribution of two different things, the
guesswork of the mixture has to be at least -- or it's bounded by the
guesswork of the two component distributions weighted by what weight
they have in the mixture. So punchline here is that if you're guessing
against the rockiest set with probability one in a million, you end up
guessing against one of the people who's chosen a password from a
really big space. And when that happens it takes you on average 2 to
the 127 guesses to succeed.
And when you multiply those together you can see that even if you
guessed everybody else's password instantaneously, it would take you on
average more to the 2 to 107 guesses to guess a RockYou password. And
this is without assuming anything about how everybody else chose their
passwords. And I think we'd probably all say that 107 bits is not
really a useful number about the RockYou data. There's no way that
people's passwords are actually that strong.
So what's going on here really is that the outliers in the distribution
completely overwhelm the statistic and make it meaningless for us. I
see a couple of slightly perplexed looks.
>> : So this is assuming that you pick one username and you just devote
all of your energy to going after that password?
>> Joseph Bonneau: Yeah, exactly. Yeah, so what Cormac said is exactly
the right observation to transition to a metric that makes a little bit
more sense.
>> : This is just taking the average of a very heavy tail distribution.
>> Joseph Bonneau: Exactly. Yeah.
>> : So I guess what's perplexing me is a lot of people study minentropy which is like the right measure for password [inaudible]...
>> Joseph Bonneau: Exactly. Yeah. Okay, so yeah. We're getting there in
a few slides. Perfect. Okay. So as Cormac said the guesswork assumes
that we're going to guess forever. But in reality what's more likely is
that we have some big set of values, and we'll be happy if we can guess
correctly on some of them but not all of them. So if we're guessing
surnames, we have M things to try. A dumb strategy would be to exhaust
all of your effort on the first name before you go to the next one
which may lead you to guessing some really weird surnames that only a
couple of people in the world have. Obviously we want to guess the most
likely thing for everybody first and then the second most likely thing
and so on. And we'll be able to hopefully succeed without ever getting
into the heavy tail of the distribution, the reason where people have
picked the really strongly random stuff.
So it's fairly simple to come up with a few metrics that actually were
proposed in the information theory literature in the nineties. We
basically just have to model when our attacker's going to give up, and
we can do that either by saying he'll give up after a constant number
of guesses. So this is the beta success rate. Or, we can say that the
attacker has some desired success rate that they want to get to and
they'll give up after that. So basically if the attacker wants to have
a 25% chance of success, they'll know exactly the number of guesses
they need to get to that. And they'll try that number of guesses on
every account.
So these metrics I think are quite good. Min-entropy was proposed; minentropy would basically be if you're limited to one guess per account.
But this is an extension of Min-entropy if you do more than one guess
per account. The only thing that's not quite right about this metric is
that this measures the size of the dictionary you need, but you won't
actually have to use your whole dictionary on every account. So if I
told you that there are a set of one million passwords that
collectively cover 25% of the users in the RockYou case then you could
compute this metric exactly. And you could say, "Great, if my desired
success rate, alpha, is 25%, I'll need a dictionary of size a million."
But you won't actually have to do a million guesses per account because
you'll hit early a lot of the time.
So a new metric proposed by me is to scale down this required
dictionary size a little bit by saying that in the cases where you
succeed, you'll require less than the full dictionary size. And you
just do a partial summation which is like the original guesswork
metric. So I've basically taken this guesswork or guessing entropy idea
and scaled it down by saying, "The attacker's only going to guess until
they get to a certain probability of success." Yes?
>> : So you said you'll guess early a lot of the time?
>> Joseph Bonneau: Yeah, you'll succeed
>> : Which is...
>> Joseph Bonneau: early.
>> : A lot of the times. That's counter to the fact that it's a hundred
and something bits of entropy because a lot...
>> Joseph Bonneau: So, yeah. I guess...
>> : You would guess early rarely.
>> Joseph Bonneau: So you -- I think we may be thinking of two
different things for guess early. I'm saying that within the accounts
that you successful guess, you'll almost always guess them not with the
last guess from your dictionary but, you know, one of the earlier
guesses. And when that happens then you get to quit.
Right, so this is basically a generalization of the previous guesswork
metric where if you set alpha equal to one, say that the attacker
desired success rate is 100% and they're going to guess forever until
they succeed every time, you get to the old guesswork metric, which I
argued was kind of useless because it gets skewed by the uncommon
stuff.
All right. So this is a quick plot of some PIN data, and so we have
either the dictionary size or the required number of guesses as you
increase your desired success rate. And the dark black line here, this
is a random four-digit PIN. The dotted line is a random three-digit
PIN. And the blue is what people actually chose.
And you can see that for lower success rate the required dictionary
size or the expected number of guesses are roughly the same. But for
high success rate the expected number of guesses is much lower than the
dictionary size. This is because the ability to quit early becomes, you
know, increasingly useful. So I think this is in a sense the right plot
and the right way to think about it. The problems are that it's
impossible to see what's going on here. All of the information here is
basically crushed, and it's a little bit hard to reason about the
difference between these two uniform distributions. They have a
different slope, but it's not easy to visually pick out exactly how
strong those are.
Oh, and if you're interested in seeing where I got the PIN data, I had
a publication at Financial Crypto this year that was all about trying
to figure out how people pick PIN's.
Right. So back to what I was saying, to make this graph a lot more
useful what we went to do is convert everything into bits. I think bits
are the scale that security people and crypto people are used to
reasoning about, and they're also logarithmic which I'll show you in a
second why that's so nice.
So I think that this is probably one of my less-interesting slides so
I'll just show you the formulas and say that they're there. They're not
super complicated. You move things around a little bit and you take a
log. And you can convert these metrics into bits. With the nice
property that for uniform distributions every one of these metrics will
give you the same strength as measured in bits.
Right. So visually back to this picture, if we convert to bits we get
this. So now two uniform distributions are flat lines which is nice. So
we can see that the three-digit pin is about ten bits; or if we use a
scale of dits, it's equal to exactly three. And the random four digits
is up here also a flat line. And our expected number of guesses is
here, and you can see with a lower desired success rate it's almost the
same as the required dictionary size. And then it starts to peel away
and for alpha equals one, we get the traditional guesswork metric. And
the Min-entropy is down here. It's basically an attacker with a desired
success rate of zero or any epsilon just slightly greater than zero.
For them the difficulty is the Min-entropy. And the Min-entropy is
extremely simple. It's just the log of the probability of the most
common event in the distribution, and it serves as a lower-bound.
Everything from this point on has to be higher than the Min-entropy.
So, yeah?
>> : Actually, it's quick. First, so this the data of the PIN number
summation.
>> Joseph Bonneau: Yeah.
>> : Okay. Why is the curve going down? Like in the left-hand side
there. Like...
>> Joseph Bonneau: Here?
>> : Yeah.
>> Joseph Bonneau: Why is the curve going down?
>> : I feel like as it's going to the right there there's a part where
it's actually sloping down. [Inaudible]...
>> Joseph Bonneau: I think it's flat right here.
>> : Oh, it is flat? Okay. Just -- Okay.
>> Joseph Bonneau: Okay.
>> : It's never going down on us.
>> Joseph Bonneau: Yeah, it's never going down, but it will be flat
over a small region which is -- I think in this data set the most
common PIN had a probability of like 1.5%. So if your desired success
rate is anything up to 1.5% then it's flat in that region because
you'll have to do one guess. And it's sort of -- You can see a couple
of steps and then it becomes smooth as the events have, you know, low
probability. All right.
>> : There's sort of an uptake there at, you know, .35 or something. Do
you have any idea what that is?
>> Joseph Bonneau: I don't. I mean it's probably -- So the PIN
distribution and when we studied PINS, I mean, people choose PIN's by a
lot of different strategies. So there's, you know, people who choose
something really -- the easiest thing they can remember. If it's 1234
or I think 0258 is like straight line down the PIN pad. And then it
transitions into people picking birthdays, so you have sort of, you
know, a uniform distribution over a smaller space and then you get to
other stuff. So if I had to guess this is probably roughly the really
common PIN's, maybe the range of dates because I think it was about 20%
of people who chose dates. And then you drift into people who start
actually doing pretty sensible stuff like picking an old phone number
which may yield some other attacks but is basically random.
>> : What was the population?
>> Joseph Bonneau: This population was actually iPhone application
users.
>> : In the U.S. or worldwide?
>> Joseph Bonneau: Yeah, U.S.
>> : Okay. The four digits might correspond to a telephone number.
>> Joseph Bonneau: Right. Yeah. Okay, so this generally makes sense.
The goal is to recreate this diagram for passwords and in particular to
be able to do it for different groups of users to see how they compare.
Right. There are some theoretical stuff that I think I'll skip in the
interest of time. Okay. And I guess a quick summary, I argued that
Shannon entropy doesn't apply to the guessing we're interested in, the
guesswork measure is skewed, so we want to use this parameterized
guesswork metric. But the other metrics I mentioned can be useful in
some cases. It's not possible to compute this guesswork metric for old
results that are published, so a lot of old results will basically be
reported as, "We tried a dictionary of size N, and we broke this
percentage of accounts which can be converted into the required
dictionary size metric." But we can't figure out the expected number of
guesses per account because they haven't reported how difficult it was
at step along the way. And for reasoning about really limited online
attacks, like if somebody only has three guesses, then obviously we'll
want to use the metric that just says, "How much success can you have
with three guesses?" or whatever the rate limit is. And in that case
there's the gain from switching to the expected number of guesses is
very, very low.
Okay. So given that I just want to look at the histogram of how popular
things are, this let me set up an experiment to collect passwords in a
privacy-preserving way. So why not just use the RockYou data or other
league data sets? Well, there's no demographic data which doesn't let
me do all the experiments I would like to do. There's some question
about if the RockYou passwords represent passwords that people choose
for more secure sites because RockYou just develops a photo-sharing
application and some Facebook games. So maybe these aren't the best
passwords. And I do think that there are legitimate ethical concerns
with basing the whole field of password research on data that has been
leaked after being stolen. There was a panel we did at FC with Stuart
and some others where we got into this in more detail. I will say
actually I went into the panel, you know, being more convinced that we
should use data that gets leaked if it's available and it can advance
science. And I came away with it with actually a lot more reservations
that there are good reasons that we shouldn't be basing what we're
doing on waiting for the next website to get hacked.
All right. So I went to Yahoo last spring as an intern and I came up
with this proposal. So the normal login process sees a stream of
usernames and passwords and clear text before checking them against
some database of hashed passwords. So I put a proxy server, for the
experiment, sitting in between the two that would be able to see the
stream of clear text passwords. So the simplest design which is
deficient but would work is to just log every password that gets seen
in plain text and throw away the usernames. So this is basically how
the RockYou data was done, is that the passwords were leaked by a
hacker who is trying to be somewhat ethical who said I'm going to leak
the passwords and not the usernames, so you get a big data set but it's
hard to tie these to individual accounts.
I think that this is certainly a bad model for an experiment you're
setting up on purpose for a couple of reasons. If this data ever
leaked, it would let you train cracking dictionary better. And also for
people who've gone out of their way to memorize some really strong
password that's never been seen before, once it's been seen once
anywhere it can get added to a cracking dictionary and you basically
hurt that person's security. So we definitely don't want to store the
plain text passwords. We could store the hash of the passwords which is
okay. It sort of eliminates the problem of adding some password that's
never been seen before to a cracking list, but it's still possible to
go and run a cracking attack and try and figure out which passwords
people at Yahoo are most likely to use.
So the insight that basically let me do the experiment that won over
the management team is to say the proxy server at the time the
experiment starts is going to generate a key randomly which is keeps in
memory during the experiment. Every password that gets hashed will be
hashed along with the secret key. And then when the experiment is over
this key gets destroyed and the log of hashes of passwords with this
key is then impossible -- It's impossible to do the dictionary attack
because the key is gone. So it's different from salting in the
traditional sense when passwords are stored because the point of salt
is to make it hard to tell if two users have the same password which
was a requirement for doing statistics to be able to tell if two users
have chosen the same password. So this is the same key for everybody
but it was 128 bits long so impractical to brute force. Yeah?
>> : What if the user mistypes the password? You're capturing here what
the user's typing when they're trying to log in, right?
>> Joseph Bonneau: Right. Yeah, so obviously I wouldn't want to log the
incorrect passwords. But I actually noted the success or failure of the
login and only kept the successes. So there are a couple more steps to
this. In particular since I wanted to be able to look at different
demographic groups of users, for every user that's seen I would do a
database query so that I could get a few bits of information about the
user. So this is sort of pass one of doing that. You see that "joe" is
logging in, and then you log his password along with, say, a gender and
a language and an age group.
>> : If you could include whether they're pregnant or not, we could get
one of the questions we're not allowed to ask you [inaudible].
>> Joseph Bonneau: Right. Right. So the problem with doing this is that
if you can re-identify users based on their demographic data then it
would be possible to pair together users and see that they had the same
password. So I want to collect really detailed demographic data. And of
course as it gets more detailed, you'll start to have people who are
unique in the experiment. And if it's possible for me to re-identify
myself and re-identify somebody else and see that we have the same
password then I've basically figured out that person's password.
So instead of storing all of the demographic data directly in the log
next to the password, the solution is to just store a bunch of
different logs that are separate for each demographic detail. So
basically, in the case of me, it will get written to the log of
passwords of users who self-identify as males; it may not be accurate.
And, you know, every other piece of demographic data that I have. And
one final thing was to just keep a Bloom filter of users that you've
seen so that you don't double count people and, of course, to check the
login status. Yeah?
>> : I'm curious, you said you did this at Yahoo. I'm curious if you
were worried about the collection proxy, K, being subpoenaed?
>> Joseph Bonneau: Being subpoenaed?
>> : Yeah. So let's say...
>> Joseph Bonneau: Subpoenaed?
>> : that somewhere says, you know, one of the users is Osama Bin Laden
and here is a warrant to go subpoena [inaudible]?
>> Joseph Bonneau: Right. So I wasn't worried about that for a couple
of reasons. For one thing K was destroyed, so the actual Zeiger key was
made...
>> : So you destroyed it at the end of the collection?
>> Joseph Bonneau: Yeah, yeah. Right which was over in 48 hours. And it
was -- Yeah, I mean the machine generated its own key. It was hashed
with a key that -- You know a manager generated a key separately to add
in which maybe he didn't destroy his personal copy of. But in any case,
I mean, Yahoo has a pile of hashed passwords anyway, so if a subpoena
came in they have hashes of every user's password sitting around to
begin with.
>> : But those are salted.
>> Joseph Bonneau: Yeah, but the salt doesn't matter. I mean, they have
the salt.
>> : They just subpoena the login server.
>> : Yeah.
>> Joseph Bonneau: Yeah. Yeah.
>> : So why a hash instead of pseudorandom function?
>> Joseph Bonneau: I mean I'm using a hash as a pseudorandom function.
I mean, yeah, like what -- I mean, I'm using a hash with a key as a
pseudorandom function, but, I mean, what...
>> : Like some hash functions potentially check whether two passwords - Like one password is the prefix of another password.
>> Joseph Bonneau: Hmm. Talking about like length extension thin? I
guess I wasn't concerned about that because every password was shorter
than the block size. So we did Shaw256, so all of the passwords fit in
a single block. But, yeah, I mean point taken. Yeah.
All right. So generally the providence of this data make sense? Okay,
great. So afterward, yeah, data went to me. So experiment ran for 48
hours. The goal was to get a hundred million users, and we
miscalculated a little bit. We got seventy million. 42.5% of the
passwords were unique within the survey and computed a whole bunch of
predicate functions some of which turned out to not be very interesting
and a lot of which didn't get to a minimum sample size to do
interesting statistics on.
>> : So what was the uniqueness one more time?
>> Joseph Bonneau: That means that -- So of the seventy million
passwords, 42% didn't choose the same password as anybody else.
>> : Unique usernames but not unique passwords.
>> Joseph Bonneau: Well, every username is unique.
>> : Okay.
>> Joseph Bonneau: But a lot of the passwords, so yeah 58% of people
chose the same password as somebody else in the sample.
>> : That's very significant [inaudible] RockYou. RockYou was about a
third unique across thirty million, so this is twice the sample size.
You'd expect far less uniqueness.
>> Joseph Bonneau: Yeah, it was. I mean it'll show up on the graph
later but these passwords were stronger by a couple bits.
>> : [Inaudible] is that the RockYou passwords were weak. I mean these
people were choosing the passwords for RockYou [inaudible].
>> : Well, I'm saying the distribution shows a lot more dedication from
the users to choosing [inaudible].
>> Joseph Bonneau: Yeah. And it's significantly better in a statistical
sense. In terms of real-world impact on guessing attacks they're not
dramatically better. Okay. So I should probably go quickly over this so
that I can show you some of the results. But there's some question
about how we can approximate these metrics given what I got from Yahoo
which is a random sample. So we don’t have perfect knowledge of the
distribution like I assumed when I presented all of the metrics.
So it turns out that even sixty-nine million people sounds like a
really huge sample, but it's inadequate to compute a lot of things. So
if I take random sub-samples of what I had and compute different
metrics naively this is the entropy, the guesswork and this is the
total number of passwords. And you can see that these are still growing
pretty significantly even as we add this, you know, sixty-ninemillionth user to the sample. So basically projecting we can tell that
if we added many more users to the sample, we would get a higher
estimate for the entropy and the guesswork and all the rest of it. So
we can sort of see just from this graph -- And I'll show you this
theory in a second -- that these metrics, it's not possible to compute
even in a sample of this size.
For the scaled-down guesswork metric, if we take it at a 25% desired
success rate we can see it's growing pretty steadily. And then at a
certain point which was smaller than our actual sample size it does
stabilize and the naïve estimate becomes correct. And for the simpler
stuff, like the Min-entropy this is equivalent to Min-entropy or the
Min-entropy extended to ten guesses, we can basically estimate these
correctly very easily with a small sample size. So even with just about
a thousand users, we have a pretty good idea of what the Min-entropy
is.
So some results from theory: we know we can't estimate -- Well, there's
no known way of estimating the total number of passwords that are
possible by just looking at a sample of this size, can't estimate the
entropy. And there's some general results of computing properties, any
properties of distributions that can be extended to this guesswork
metric which say that it won't be possible to compute the guesswork.
And the intuition behind this theorem is that you can't compute any
metric that depends on things that occur only a small constant number
of times in the sample. So particularly if things that occur only once
in the sample would affect the metric then you won't be able to compute
it accurately. And obviously in the case of guesswork, some things that
only occur once do make a big difference.
We can escape that result for some of the partial guessing statics just
because they only depend on things that all occurred, you know, many
times in the sample. And the other theoretical result, which is
interesting, is that asymptotically you can't do any better to estimate
any of these metrics than just throw away all of the uncommon stuff and
use the common stuff which is relatively well estimated from the
sample. So that's going to be the approach.
So if I'm just trying to compute this required dictionary-size metric
at different sample sizes and I take different sub-samples of the
distribution, so I say reduce to ten million or one million samples,
you can see that for a low desired success rate even at sampled down to
five-hundred thousand people we're estimating the exact same values.
But as you get higher, one by one, the sub-samples flatten off. This is
the region where this is all based on things that we're only seeing
once in the sample. So you get these huge systematic underestimates if
you try and compute these things directly.
So question number one is just if it's possible to figure out
automatically what your limit of confidence is. And I developed a
technique in my thesis to do this by bootstrapping the smaller sample
to figure out what the confidence limit is, which I'm afraid I won't be
able to into any more detail than that. But it works pretty well and
can automatically pick out, marked in this diagram, by switching from a
solid line to a dashed line exactly the point where the metric is no
longer reliable. And a second question is, is it possible to actually
project if you fit the data to some model so that we can estimate the
value for higher desired success rates of the attacker. So it's not
possible to do this without assuming some model for what passwords look
like, but if we do that -- And I stole a model from the natural
language processing community that's been used to model word
frequencies which as been an area of research for about 15 years and
has been quite challenging. This is basically one of the best
performing models for doing projection of word frequencies for subsamples and fit onto passwords. I got pretty good results.
So this is the naïve estimates and by projection it gets converted into
this. So it's certainly not perfect. It leads to a big overestimate for
the sample of five hundred thousand, but I mean compared to this
picture fitting to an assumed distribution does allow you make
estimates empirically that are reasonably accurate. I would probably
caution against using the estimates out here partly because we really
have no idea what passwords look like in that space. This is the space
where the cracking evaluations usually don't go. And from empirical
data it's almost always for every data set we have this is the space
where you're reasoning passwords that have only been observed once and
it's basically impossible to do that in a meaningful way. But I do
think for up to, say, a 50% success rate where we have somewhat of a
good idea of what the distribution of the passwords look like. This
projection can be useful for taking a smaller sub-sample and still
estimating this with some accuracy.
All right. So some data. So -- And I've used the same projection
technique in all of these graphs that I'll show you, and I've marked
the point where I'm switching from accurate estimate no need for
projection to a projection. So this a comparison of Yahoo on the top
here to RockYou in purple and to other big leagues that I was able to
have.
>> : Does Yahoo have any password composition policy? Or is it just...
>> Joseph Bonneau: Good question. Currently the only requirement is six
characters, and some of these passwords are actually collected prior to
that being instated.
>> : So 123456 is a valid Yahoo
>> Joseph Bonneau: Yeah.
>> : password [inaudible]?
>> Joseph Bonneau: Yeah. And I actually don't know what the most common
Yahoo password is because of the way the experiment was set up. But
123456 has been the most common in everything I've ever looked at, so
it's a pretty safe bet.
>> : They do you have the password [inaudible] to try to get people to
use [inaudible].
>> Joseph Bonneau: Yeah. So actually, yeah, I'll come back to that in a
second. But basically all of these websites that got leaked -Battlefield Heros was a game. Gawker is a blog. RockYou is Facebook
apps. So the Yahoo passwords, there is a gap of about two bits here but
it is harder. Basically across the range of different guessing attacks
the Yahoo passwords are a little bit better, but to an attacker two
bits isn't a huge slow-down.
>> : [Inaudible] putting in a lot of work, that one's actually easier
than some of the other ones?
>> Joseph Bonneau: Yeah, exactly. So on the tail here the most common
thing at Yahoo was actually more common than at some of the other
sites. Yeah, that's exactly the way to interpret this. Okay, and if I
add in a bunch more data from all those cracking experiments earlier,
and now we have to switch to the expected dictionary size and not the
expected number of guesses, it's pretty sensible. Basically all of the
cracking results that have been published are overestimates of security
in the sense of all the guessing metrics I've proposed model an
attacker who's perfect. So it makes sense that the real cracking would
be a few bits worse than that consistently.
Although, it's also interesting how the variation between different
leaked data sets is relatively small compared to the huge variation
between different cracking attacks, which to me is a justification for
using the statistical metrics because I think that there's a lot of
uncertainty introduced by cracking. Yeah?
>> : The various cracking studies were they real password collections
or lab studies or [inaudible]?
>> Joseph Bonneau: So all of the ones here were real data. So I think I
switched from circles to hexagons here to indicate these were -- Well,
actually so these were a lab study, this purple dot here. And these
were all system passwords, and these were all real web passwords that
got leaked.
Okay. And another comparison I thought was interesting, how do
passwords compare to actual natural language. So here we have the Yahoo
passwords, and I took data from the Google Ngram Corpus which has
frequency counts for words, pairs of words, triplets of words. And
basically the conclusion is that a password is between two or three
words produced in natural language.
All right. And as for some of the demographic things, there's a massive
table in the paper with every different group. At a high level, though,
people from different countries varied -- One of the biggest variations
was splitting people up by country. I've heard various theories, I
guess, of why certain countries were stronger or weaker. Informally
there wasn't really any great correlation with region or GDP of the
country or anything like that. I think the U.S. and China were about
the two strongest. I don't have a great justification for why that is,
but I think it's interesting that the variation in countries was so
large. Yeah?
>> : Demographic mix?
>> Joseph Bonneau: Yeah, I think that it could be that...
>> : [Inaudible] the United States is.
>> Joseph Bonneau: Yeah.
>> : China is [inaudible].
>> : Aren't there still [inaudible]?
>> : [Inaudible].
>> Joseph Bonneau: So I think that the...
>> : [Inaudible] factor here.
>> Joseph Bonneau: Yeah, although, I'll come back to that in a second.
The U.S. benefits because people from a lot of different countries that
don't have -- So Yahoo has different versions in a lot of countries but
not every country. And a lot of people come to the U.S. site who are
actually from a different country.
>> : Is Hong Kong part of China here?
>> Joseph Bonneau: Well, this isn't actually where users come from by
IP address. This is users who go to Yahoo U.S. versus Yahoo China.
>> : Okay.
>> Joseph Bonneau: So people from Hong Kong can go where they want.
Okay. The age-group question that I threw out: those of you who picked
the older users being better are actually right. This green down here
is age 13 to 24, so teenagers and early twenties, and they were
actually the weakest. The variations is much lower by age than by
countries but, yeah, age 45 to 54 I think was the strongest and then 55
plus was good too.
>> : Does that surprise you how close they are?
>> Joseph Bonneau: Compared to -- Well, a lot of the sub-distributions
were very close. In fact the biggest high-level conclusion is that the
demographic groups really don't change very much. There's sort of this
weird sort of universal distribution that everybody's not too far from.
I definitely though going into it that the younger users would be
better because of this sort of digital native, you know, mythos. But I
think it's also possible that the older users are just more
conservative.
>> : You pointed out that these a very small differences. Are they
statistically significant?
>> Joseph Bonneau: Yeah, they are statistically significant. They're
statistically significant at least up to here to within .1 bits which
is smaller than the gap here. But in terms of real world significance,
if it's like .4 bits then probably not a big story for attackers. More
interesting I think was a couple of different groups of users who
should be, in quotes I guess, "more motivated" to choose a secure
password. So the people who have an account with Yahoo's retail
service, these people have a payment card registered. So, one might
expect that people who have a payment card registered will be much more
motivated. There is actually a very big gap here in that the users with
a payment card are much less likely to choose one of the most common
passwords globally.
But up here comparing to general dictionary attacks they don't really
do very much better.
>> : But isn't that precisely where you want -- I mean if I put a
payment card -- I mean if Yahoo is allowing somebody to get away with a
30% success rate, I mean it's like, well, it's all over anyway. Right?
It's...
>> Joseph Bonneau: Yeah.
>> : ...[inaudible].
>> Joseph Bonneau: Yeah. So I mean you could definitely argue that
users are making a rational choice here which is that, "I have a
payment card so I'm not going to choose 123456, but I'm also not going
to take the time to memorize say like eight random digits," or
something like that which is exactly what was observed.
>> : Why do you think people care more about, or would care more about
[inaudible]...
>> Joseph Bonneau: Well because, I mean, if their account gets hacked
then....
>> : The credit card company.
>> Joseph Bonneau: What?
>> : If your credit card gets stolen, your credit card company
[inaudible]...
>> Joseph Bonneau: I think most people don't...
>> ...[inaudible].
>> Joseph Bonneau: Most people don't know that and it's a hassle,
right. I mean I think it seems more serious if you have a credit card
registered. And I guess the last one I should show you because I'm
running a little bit over: Yahoo changed registration forms about two
years ago. They switched from no requirements and basically no feedback
to a six-character minimum and a graphical indicator to try and nudge
users to pick better passwords.
>> : But by no requirement you mean "A" would work as a password?
>> Joseph Bonneau: Yeah. So they -- I believe so, yeah. But I should
check. But I know that they didn't have -- They might've had like a
three-character minimum but they upgraded it to six. So the gap -- So
this was version two was the old one. They upgraded to having this
graphical indicator. This made a gap of about a bit, and on the low end
it basically did nothing because most of the most common passwords on
the web are sort of prescreened to satisfy six characters. So 123456 is
more popular than 12345.
So the conclusion here might be that it's possible that Yahoo didn't do
a good job designing the graphical nudge. I mean it's also possible
that the gain you can get from nudging users just isn't that high. One
more interesting result, I think. I took the sets of passwords
registered by users coming from different languages and I said,
"Supposed I trained a guessing attack on speakers of one language and
did it against passwords registered by people who speak another
language, how efficient would that be?"
So if you are attacking German-speakers passwords and you've trained on
German-speakers, this table is for a thousand guesses. If you've
trained correctly on the German users you would get 6.5%, and if you
trained on all these other languages your success rate declines a lot.
I mean if you trained on Korean-speakers, it goes down to 1.6%. But
it's not a huge gap.
The biggest gap I ever saw was actually only a factor of five. So the
conclusion from this was really surprising. I thought that if you
trained on the wrong language group that this would really make life
difficult as an attacker. But it turns out you can train on even the
totally wrong language group, like Vietnamese, and go to the Frenchusers and do okay. And if you come up with a global dictionary that
works against everybody, you can do this quite successfully. If you do
that you never vary in efficiency by more than a factor of two. This is
for like sort of a limited online attack, and it gets a lot worse if
you do a more extended dictionary attack. But basically the same weak
passwords are used by people everywhere no matter what language they
speak.
And I have a paper that's all about passwords in different languages
and character encoding where we looked at some leaked data where we
actually had the plain text. And we confirmed a really interesting
trend which is that Chinese-speakers very strongly pick passwords with
only numbers in them. So over half of Chinese-speakers, their password
is only numbers. And for English-speakers it's below 20%. A lot of it
has to do with the fact that it's very difficult to enter a password in
Chinese. So...
>> : [Inaudible] so that's surprising actually. [Inaudible].
>> Joseph Bonneau: Right. So the -- We should take that offline maybe.
Anyway if you're interested in...
>> : That's what I observed.
>> Joseph Bonneau: What?
>> : That's what I observed.
>> Joseph Bonneau: Right.
>> : Yeah, how the Chinese-speak tend to use numbers not letters.
>> Joseph Bonneau: Oh, we should catch up after. So, okay, I can
probably pause there and if there are any wrap-up questions. I guess I
would -- The wrap-up slide I had with the reference to the paper if you
want to go read in more detail. Also my thesis should be available soon
which has a lot more detail on it. Did I achieve the goals I set out
for? I think I've certainly introduced metrics that aren't
demographically biased. They're repeatable. They're easy to calculate.
I think how ecologically valid these metrics are, I think that they're
pretty good. The only problem is that I've assumed an optimal attacker
everywhere, and that may not be true. I mean when we looked at the
password cracking it was significantly worse than my metrics would
indicate. So I think that that's probably one of the bigger open
questions going forward is how much worse real guessing attacks are
than optimal and collecting some data on what people actually do. And
of course I've also given up the property that these metrics can't be
computed with small data. So for people doing lab studies where they
want to give two different conditions to two sets of users, they'll
never get a sample that's big enough to compute these metrics.
>> : What would be big enough?
>> Joseph Bonneau: About a hundred thousand would be a rule of thumb.
>> : Yeah, so I guess that goes more to not just [inaudible] validity
but external validity. All of this assumes a single way of choosing
passwords.
>> Joseph Bonneau: Right.
>> So you can't use this methodology if you want to estimate how good
is a system for nudging users to make better passwords unless you're
going to expose millions of users
>> Joseph Bonneau: Yeah, exactly.
>> : to it.
>> Joseph Bonneau: Yeah?
>> : Did you collect any data on password length chosen by users?
>> Joseph Bonneau: No, I didn't the -- Yeah, I mean, by the design of
the experiment that information was thrown away. But we basically have
that from -- like we have the distribution from looking at the RockYou
data.
>> : Only within the parameters of the RockYou data set.
>> Joseph Bonneau: Yeah, exactly. I guess I sort of made a conscious
decision to go purely statistical which made an easier pitch. And I was
afraid if I started saying, "Well, this semantic information would be
good," then, you know, where would I draw the line? Because it would be
very interesting to see -- Like the thing about what percentage of
people use numbers? It would've been great within the Yahoo data to see
how that rate changed, especially when I found by looked at leaked
Chinese data how amazingly high it was but decided to not go down that
road. Yeah, do you have a question? Oh. I thought there was one over
here. Sure.
>> : So all of this is kind of, you know, there's one attacker and he's
trying to do the best he can against the whole corpus. And so if I
manage to steal the hashed password list and no one else has got it and
I'm competing with no one, this is definitely the right metric. But if
my guessing is I'm showing up at the web portal and I'm [inaudible]
limited and all that kind of thing, this is still good. But if I'm
showing up at the web portal and doing all of this, I also don't have
the field to myself, right? There's ten thousand guys out there also
trying similar strategies that they've learned from. So you could argue
that, okay, the optimal strategy is to go after the most common stuff,
123456. And congratulations, when you get into an inbox you're going to
find a thousand other guys in there who've also been using the same
strategy.
>> Joseph Bonneau: Right.
>> : [Inaudible] Given that I'm competing against an unknown number of
other attackers who are trying similar strategies, what's the right
thing? I don't want to get into something that's been bore-holed into
the ground, you know, where...
>> Joseph Bonneau: Interesting.
>> : The strategy is to be fast.
>> Joseph Bonneau: Yeah, I mean it would be easy to -- Like you could
come up with some solutions to that and say like, "I don't want to
guess the actual ten most common things. I want to guess like things,
you know?"
>> : Just choose a random distance out in the distribution?
>> Joseph Bonneau: Yeah. And then you can reason about how much your
efficiency goes down.
>> : Yeah.
>> Joseph Bonneau: That would actually be something really interesting
to go back and compute.
>> : The second question was, you know, so what was your plan if, you
know, you went to Yahoo and they said, "You want me to put a proxy
where?" I mean what were you going to do? Did you have a back up?
>> Joseph Bonneau: I mean, so we'd work this out. This was like the
proposal when I applied for the internship. So, yeah. One thing I could
add is that I have thought about the model where if you're trying to
evade an intrusion detection system that means that you want your
pattern of guesses to look like the population distribution of
passwords, and there actually is a pre-existing metric that already
captures that perfectly which is collision entropy or like Renyi
Entropy Order Two. And I marked it actually in the PIN thing. But,
yeah, that basically tells you if you draw your guesses randomly from
the population and then guess against random people how likely you are.
Yeah?
>> : Did you plot the uniqueness of the passwords' counts? Like top -Like where that tail starts, the long tail?
>> Joseph Bonneau: I'm not sure I understand your question.
>> : Or how common passwords are? So like the first few hundred are
super common; they're not unique. And then there's that long tail.
>> : Distribution.
>> : Do you have the [inaudible] distribution.
>> Joseph Bonneau: Yeah, I mean I have the histogram. I don't think I
have a plot....
>> : Is it in the paper?
>> Joseph Bonneau: It's not in the paper. I could send you the plot.
It's a pretty simple plot.
>> : Like for instance how many distinct passwords would you have to
have to cover 10% of the password [inaudible]?
>> Joseph Bonneau: Right. Well that I have. I mean that's basically
here. Just if you move up from 10% to here it's, you know, 2 to the 14.
But, yeah, I mean I could put in uniqueness and it'd be an easy plot to
do.
>> : It would just be interesting compared like Yahoo and RockYou and
stuff, where that [inaudible]. Where is a spike versus a tail?
>> Joseph Bonneau: Right.
>> : [Inaudible]
>> Joseph Bonneau: Sure.
>> : So you told us we're stuck with passwords. And people choose bad
passwords. Do you have any thoughts on what we should be doing?
>> Joseph Bonneau: I mean like I think priority number one -- I think I
had a slide about this from -- So I did a survey a couple of years ago
where I tried to actually look at the state of what websites who
collect passwords actually do. And the numbers are like -- I mean,
every time I show this slide there's like a few mouths that drop
because like websites like aren't at the level of actually hashing.
Most websites still don't use TLS correctly. Rate-limiting is like
really rare, actually. Like most websites just don't do it at all. So - What?
>> : How would you know?
>> Joseph Bonneau: Well, I tested by trying a bunch of incorrect
passwords and then trying a correct one. So it's possible that I just
didn't hit the rate limit, but I figured -- I think I did a thousand
guesses and then the correct one and I got in and everything was okay.
So I figured that...
>> : 84% of websites don't [inaudible]?
>> Joseph Bonneau: No. So it's possible that -- I think a lot of
websites will eventually rate-limit because they are worried about
denial of service, but I didn't see any specific rate-limiting up to
that point. And I think it would be hard -- There's a lot of argument
about what the right rate-limit should be, but I've never heard anybody
say it should be higher than a thousand.
>> : For what? For IP?
>> Joseph Bonneau: Well, this is -- I just did one guess per second
from the same IP, so it wasn't even cloaked or anything. So, yeah. I
mean anyway, I think the point is that the world is so broken right now
that having a really good understanding of the statistics of passwords
and how hard they are to guess is more actually setting up for future
work, I think, when hopefully we come up with some agreement where we
end every website collecting passwords and doing it wrong. And there's
a lot of work on that. I mean there's people working on different
trusted hardware devices that could potentially serve as, you know,
points for single sign-on or proposals like OpenID. Hopefully something
like that succeeds eventually, and I think in the nearer term that's
like more interesting work. But I do think like when we get to the -We'll always have the one password, so having a good idea of the
strength will be very useful when it becomes more important. Maybe
that's a good note to end on. Thanks everybody for coming.
>> : Thank you.
[ Applause ]
Related documents
Download