>> Arjmand Samuel: It is my pleasure to introduce... State University. His research interests are primarily security and...

advertisement
>> Arjmand Samuel: It is my pleasure to introduce Shahzad. He is a PhD candidate at Michigan
State University. His research interests are primarily security and protocol analysis and
behavior-based authentication. Today you will hear about some of the work he has been doing
while here at MSR for the summer. Shahzad.
>> Muhammad Shahzad: Thank you, Arjmand. I've been working here for like 12 weeks now.
This is my last week here. I've been working on behavior-based authentication using signatures
and gestures on touchscreen-based devices. Why are we trying to find new authentication
schemes? Currently in the computer world we have mostly passwords and pin codes and if
somebody else finds out your password or pin code they can access your accounts, so we want
to get rid of those. We want to not be dependent on those anymore. The research community
has been doing a lot of research on keystroke rhythm, the typing rhythms of people to use
those for authenticating the people. That kind of eliminates the use of usernames or
passwords, so you enter something. The computer finds out your typing pattern and typing
behavior and finds out if it is you or not. But the problem is that it is still not a really efficient
way of doing it because it has very high false positive rates still. The thing is that whenever you
have to login, and if you want to use your typing rhythms, you have to type something and
usually it's considerably long like 300 characters or something like that. That's not very
convenient. Anyway, we were thinking can we port some kind of scheme to touchscreen
devices. The problem with touchscreen devices is when you try to port such schemes on
touchscreen devices is that touchscreen devices do not have a physical keyboard so there is
nothing really to guide your fingers physically. In keyboards there are keys so they can kind of
guide your fingers so you have a typing or rhythm pattern. Another thing is this kind of scheme
does not work for people who are not very adept at typing, so you have to be very experienced
and you have to have a pattern for it to work. We were thinking that now that these devices
have touchscreens can we use something else. And of course the first thing that we thought
about was signatures. Can you use signatures to authenticate yourself on your touchscreen
devices? And why not? Signatures are very prevalent these days. They have been forever in
financial institutions and the reason being even if you know somebody’s signature it's not easy
to copy them. Why don't these banks and all of these financial institutions use passwords? If I
know your password I can login or I can access your account. They use signatures because
signatures are more secure compared to passwords, but the problem, the reason why
passwords is still more prevalent in computing devices is that there was no way to get
signatures on your touchscreen devices. But now with these touchscreens and with very fine
granularity of resolution that you can attain of the points that you are touching on the screen, I
think this thing can be ported, I mean signatures can be made to work on these touchscreen
devices and you can probably shift from passwords to signatures. If somebody knows your
password they can login, but if somebody knows your signature they cannot login because the
way you do your signature is very distinct and that's what we are going to explore today. For
signatures, I'm not the first person doing any work on this thing, especially, people have done a
lot of work on signatures, especially the static signature analysis scheme. You have an image of
the signature. You want to match it to another signature that will go to your database. The
problem with this thing is that the static image analysis is that what happens if somebody
copies your signature exactly, like he spends some time and he is able to copy your signature
exactly so that signature will be authenticated by the static image analysis? The other problem
is that these static image analysis techniques which just use image processing to get the
authenticated signature require a large amount of training data. Giving you an example, the
database MNIST for handwritten collectors, it has 60,000 training samples and people have
been doing a lot of work on automatic handwritten digit recognition and the best accuracy that
somebody was able to achieve was 99.6 percent on that data set and that is using about 6000
samples per digit. Digits are very easy compared to signatures, so if you need 6000 samples for
each digit to get it to train so well that you get 99 percent accuracy, you probably need a lot
more than 6000 to train on your signature. You do not want to ask your users to give you 6000
training samples. That's just ridiculous. We were kind of thinking like when you are using
touchscreens you are doing signatures and the touchscreen can basically tell you the behavior
of the way you like, it can tell you each point what time you entered, what time you touched
which parts of the screen. Can we use this information to reduce the training samples and also
eliminate the problem of perfect copying of signatures? I'm going to talk about that. Another
thing that we talk about will can these signatures be feasible for all kinds of touchscreen
devices? I have this one here. Probably it works pretty well on this thing, but can you do it on a
phone? You cannot, actually. I will talk about that as well. We wanted to look for something
else for phones, again, not pin codes and passwords, not something that you have to
remember, but just something that you do on the screen of the phone and then that is what
you use for authentication. You guys have seen that in Windows Phone whenever you have to
unlock the screen on this phone you have to just make a swipe. You first press the unlock
button and then you swipe upward to unlock the phone. On iPhone you rest the unlock button
and then you swipe right to unlock the phone. So we were thinking like can we use these
gestures to find out if it's the original person or not? That's what we worked on. We kind of
make the gestures of little more sophisticated and not just little swipes, but something more
sophisticated and we use those to authenticate the user and I will show that that is what we
can do. I have the demo applications for both of these things and I will show it to you guys and
you will see. The hypothesis behind all of this work is that everybody has a distinct way of
doing signatures, so, signatures as well as gestures, so if I can do my signatures, anybody else,
even if he can copy it or the gesture that I perform on the phone, even if somebody else can do
it, they will not be able to do it in exactly the same way that I do it. That is our basis of the
work. That is how we distinguish between people. The rest of the talk is divided into two parts.
I will first talk in detail about the signature authentication scheme that we have proposed. I will
go into detail, not a lot of technical detail, but I will give details about how this thing works.
After that I'm going to switch to the gesture-based authentication on the phones, the
techniques that we use behind the gesture-based authentication is kind of a subset of the
techniques for signatures, so I'm not going to talk about that technique in detail. I'll just
mention what parts of the techniques of the signature-based authentication we borrowed for
the gesture-based authentication. Then I will give you some demos and you guys can try that as
well. Signatures, what is the objective here? Objective is to find out if the signature is visually
correct, if it's structurally correct or not. Does it look the same as the original signature that
you are comparing with? And the second thing is once that is done, the second thing is you
want to find out if this has been done by the same person or if it has been imitated by some
other person who made it look like but it's not the same way that they did the signature. I'm
going to show you a couple of videos here just to give you an idea. You will see here two
different signatures being done by two different people and -- they are not two different
people; they are my signatures. Anyway, you will see four samples of those signatures on each
of the screens and you will see they kind of look very similar when you do it, the behavior there,
you will see it. These are four samples of the signature from the same person being done and
you can see that they kind of look exactly the same when you do it. I have slowed them down a
bit for you guys to see it. All four are different signatures, but they look kind of the same when
you are actually doing it. I'm going to show you another sample here. You will also see that
there are some differences, sometimes a little lag in some particular part of the strokes, but
there is still a behavior with which a person does his signatures. We will use this fact for
authenticating the signatures. From this particular thing we will extract velocities in different
parts of the signatures. From velocity I mean the magnitudes of the velocity as well as the
directions. I will talk about this in detail later. The next thing that we should see when we
analyze this thing, does the person have a behavior, when the pressure when he does the
signature, the pressure on the screen. I don't know if that's very clear or not but you can see
that there are some parts, the same parts that have less pressure, so the darker the dot is the
higher is the pressure. There are these dark parts and there are lighter parts so they are always
the same in the signature. This probably will make it more clear. These are the pressure parts
of these four samples and you can see that you have very distinct peaks at different locations.
This is one here and then there are two here, two here and two here. And then there are these
at the start. There are distinct peaks. This is, again, the behavior and the signature of the
person. You can use this information for authentication. This is another sample and this is
another pressure part of the thing. You can see that these have the same distinct peaks here,
here, here and so you can extract this information out and try to use that for authentication.
And another person who imitates your signature exactly the same we will not be able to follow
this pressure plot and so velocity is one thing that you can use. Sure?
>>: Doesn't this pressure plot kind of follow from the shape of your…
>> Muhammad Shahzad: No. Maybe I should have done that. If you do your signature, try to
do it a little differently and then, maybe a little slowly or something. Your pressure plot will
automatically change. Pressure is not based upon the shape. Pressure is how much pressure
you are putting. So if you have a behavior you will always press hard, press little, pull up your
pointer finger or whatever you are using and then that kind of makes it distinct. Right now I
talked about velocities that you can use, the magnitude of velocities, the directions and the
pressure. The magnitudes of the velocities kind of give you the behavioral information. The
pressure will also give you the behavioral information. The directions of the velocity give you
the structural information. You use the directions to find out if the structure of the signature is
correct, like if it's really correct or not. You use the magnitudes of the velocities and the
pressures to find out if the behavior of doing the signature is correct or not. There are other
features as well that I will talk about. Let's see how to extract these features, because the thing
is that you don't really know which exact -- there are parts of signatures which do not have
consistent behavior among different samples, so you don't want to use any information from
that particular part of the signature because it is going to increase false positives and imposters
will still be able to login. You want to not use that information and use only that information
from those particular parts of signatures that have consistent behaviors. So how do we extract
those features? And easy way is to provide the whole signature into several sub strokes. Each
signature has several strokes. This particular signature has eight strokes. This is one, one, two,
three, four, five, six, seven and eight. You have eight strokes. You divide each stroke into
several small parts and then the thing is that you also have to figure out how small should this
sub strokes be. If it's too small it will kind of become instantaneous and there won't be any
consistency among different samples. If you make it too large it will average out all the
interesting information and you, again, won't be able to find out if there is any distinguishing
information in there. You have to have a particular correct size for the sub strokes. There are
two challenges. The challenge is first, you have to find out what parts of the signatures have
consistent behavior and in those parts what should be your sub stroke size which will actually
extract a consistent behavior. If the sub stroke again is too large it averages out too small. It is
instantaneous and you don't get anything out of it. To find the consistent features you divide
the sub strokes, divide each stroke into sub strokes of certain time T. We use T equals 20
milliseconds, 30 milliseconds, 40, 50, so you divide each stroke into several sub strokes of
different time periods and then take one particular stroke, like maybe this one black here. Take
this one particular stroke from all the signatures that you have and then just extract the
average feature value from this particular sub stroke. For example, from this particular sub
stroke you can, if you want to find the pressure, take the pressure values among all the points
in this upstroke and take the average from the other sub strokes from all other signatures, take
the pressure values, take the average. If you have, for example, a hundred training samples, so
you have a hundred of these sub strokes and you have a hundred average values of that
particular feature that you are using. It could be pressure. It could be velocity. It could be
direction. Once you get the hundred values of those signatures, of those sub stroke features
from all the signatures, you take the mean and you take the standard deviation. Then try to
find out if the standard deviation is large or small. If a standard deviation is large, that means
this feature is not consistent among different samples of the same signature. If that standard
deviation is small, then you can probably use it because the user has the same behavior every
time. So we use coefficient of variation which is just simply the ratio of standard deviation to
mean. We use coefficient variation of 0.1 and 0.2. 0.1 means the standard deviation is just plus
or minus 10 of your mean value which is very restricted. The reason why we are using such
small value of coefficient variation is because it is okay to reject the legitimate user sometimes
to not be able to login, but it's not okay to let an imposter be able to login even once. So you
use small value of coefficient variation. This will increase your false negatives, kind of might
become annoying, but it will never let an imposter get logged in because you are allowing such
a small amount of deviation in the feature values that an imposter may not be able to do it. We
take the coefficient variation of each sub stroke in the whole signature and get the feature
values and see if the coefficient variation is below this threshold. For all the feature values
where you have the coefficient variation below this threshold, you use those feature values for
training of your classifier. If a certain feature has a coefficient variation greater than this
threshold, you just don't use it. You ignore that one. Using this strategy this is what we get.
Let's talk about that action. This is the entire signature. It is divided into several parts and from
each part we calculate the value of direction and then take the coefficient variation from each
sub stroke and then you see these particular areas, the green ones, are always consistent. They
have a coefficient variation of, the value of direction is always less than 0.1. You extract
features out of these and you ignore the rest of the stuff. Whenever somebody will try to log
in, he will provide your signature. You will extract the values of direction from only these parts
of the signatures and then you will try to compare it, because these are very consistent. They
are always the same for the legitimate user. If a match, then it is probably the legitimate user.
But these are not the only things that we use. If you increase the coefficient variation to 0.2 -0.1 was this. 0.2 means you can use a little bit more information out of it. 0.2 is still
acceptable. It's not too bad. This is the other signature. These are the areas from where you
use the values of direction. If you increase the coefficient variation to 0.2 you extract values
from a few more locations of the signature. Velocities, the magnitude of velocities, if you use
coefficient variation of 0.1 this particular user does not have enough consistency so you know
we didn't use any values here. If you go to 0.2 you have quite a few places from where you can
extract the magnitudes of the velocity. Same is the case with this user, coefficient variation of
0.1, no velocities extracted. 0.2, yes, you have enough consistency that coefficient variation is
less than general 0.2 and you use these velocity values. If somebody finds out even for a
coefficient variation of 0.2 you do not have any feature values that you can extract, like nothing
is consistent, you can increase, keep on increasing the coefficient variation so it will let the
legitimate user login eventually, but again, if you use high coefficient variation there is a good
chance that you will let an imposter login as well. Although users that I have studied, all the
ones that we had we were able to do it with 0.1 as well as 0.2, so we didn't have to go above
0.2. I think generally 0.2 gives you enough features. From the pressure, we saw the pressure
was kind of very consistent among different signatures. We use a lot of the parts from the
signature for the pressure, from both of these signatures actually. There are other features
that you can use. For example, total time of the signature. For this particular user the total
time was generally 3.9 seconds and it varies from about 3.7 to 4 seconds. For the second user it
varied from 2.8 to 3.2 seconds, so it's not too much variation. You can probably use total time
of the signature as well. We are not really using the total time of the signature. We are
actually using time of each stroke. That gives you enough, that is consistent enough. These are
the other features that we use. The duration of each stroke in the signature, this gives you
behavioral information. Inter-arrival time of strokes. That is the time between two strokes, like
how much time it takes you to lift your pen or finger from one stroke and start the other one.
We use that. And then we also use displacement between bounding boxes of the strokes. I will
talk about what bounding boxes are. You extract all these features from, you take all the stroke
time, you take all the inter-stroke time, you take all the displacements between consecutive
bounding boxes and you, again, run the same test of coefficient variations are .1 or 0.2. If it's
less than that you use that feature. Otherwise you don't. This is one signature of one of the
users. We can see that inter-stroke like the time between two strokes and time of strokes, the
coefficient variation is reasonable, less than 0.1 in most guesses there and less than 0.2 in most
cases there. Sometimes in some signatures some stroke times do not have coefficient variation
of less than there .2. We just do not use those stroke times. This is what I meant by bounding
boxes. Bounding box of each stroke means the smallest rectangle, rectangle with smallest area
that bounds that stroke, so you take that box then basically join the centers of all of the
consecutive boxes. You can see that each user, so all of these four signatures have kind of the
same pattern. The location at which a particular stroke is located compared to the other
strokes is kind of usually the same. This previous thing showed the coefficient variation of
direction and the distance. The distance means this distance and direction means this
direction. So the coefficient variation of direction is always very less, so that is a very good
feature to use, so we use that feature. And in some cases you can also use the coefficient
variation of the distance between bounding boxes, so that's also usable. Now I have talked
about what features we use that include velocity, the magnitude of the velocity, the direction,
the pressure, stroke times, inter-stroke times and displacement between bounding boxes.
These are the features that we use to identify a signature from the other signatures. We train a
classifier. I will talk about that and then we do the testing and that's how it works. Before I do
that, there's one last thing I want to talk about is this. I'm not going to go into the exact details.
If somebody is interested we can take it off-line. Sometimes when you are doing this, this is
kind of the signature that generally looks like from a user. It has eight strokes here. You can
see that these two signatures have seven strokes. This particular stroke is combined and these
two strokes, they are combined. We came up with a way to basically separate them. If you
have seven strokes there is no way you can make the sub strokes and then correlate and try to
find good sub strokes that you could extract features from. So you first need to make the
number of strokes in the signature consistent and then after that you extract the features.
Generally, what effectively we do is we take information from all the strokes of the training
data and when a problematic signature comes in we take all the strokes and we try to find out
which stroke you should split, which stroke is not consistent with the training data. Once you
find that out then we, again, use the training data and try to find out which exact parts of this
stroke we should split from and then you would remove that certain part from the stroke and
you get your desired number of strokes, like this. For this one, we just removed a little bit of
the thing from here and it divides into two for this one. We remove this part and it makes the
number of strokes consistent. All right. So that is all of the feature extraction part. After that
we take the Support Vector Distribution Estimation. It's a one class estimation so you do not
have any information from the other imposter’s class. All you do is open implementation of
SVM. One class SVM is called Support Vector Distribution Estimation. That's available online so
we use libSVM for that. We took these features, trained our classifiers, did search for
parameter optimizations. I'm not going to talk about that because that is standard machine
learning part. We did that and it works up pretty well. I'm going to give you a demo of this
thing now. After the presentation anybody is welcome to try to do these signatures and try to
log in. I'm not sure if I'll be able to do it here or not, but generally, I have, so please trust me.
Let's see. It tells you here that it's a legitimate user. These are done by me. I'm going to do it
again. I will make them look exactly like this thing. I cannot make it look exactly like, but I will
make them look similar enough that if a person visually sees them he will say that this is from
the same person. I will try to do it differently and then it will tell you that it's not the same
person. These are kind of hard, but if you want to try it you are welcome to. Does it matter if
the user gets visual feedback from…
>> Muhammad Shahzad: The reason that I'm not getting, yes. It matters. If you have visual
feedback, we have not done that, but I think it's going to be much more convenient for the
user. The reason I'm not doing that is because I need to have very good sampling rate and if I
try to draw the thing on the screen is kind of reduces my sampling data. When it reduces my
sampling data it reduces my accuracy. See this? This one is imposter. This one is legitimate,
and they look exactly the same to the naked eye, right? This was, I mean there will always be
differences. You can never do your two signatures exactly alike. If I signed it at the bank
probably they would accept it. This is the idea behind this thing. That was the signatures part.
Now let's move to the gestures. Why don't we just use signatures on the phone? How hard
could it be? This hard. This was the best that I could get out of my signatures. I do both of
these. This is what it looks like on the phone. Maybe, maybe you can have some accuracy here
as well. We didn't try that, but I don't think it's going to be very good because it's hard to
imitate your own signatures on the phone. The screen is small. You don't have any place to
rest your hand and it makes it hard to do your own signatures on the screen. Again, if you are
driving you probably do not want to do your signatures to unlock your screen, right? There
should be an easier way for the phones. What do we mean by gestures? These are the
gestures that we have. We have 10 gestures. I started with 39 gestures. I gave a few phones
to some volunteers. We've got like 40 volunteers. They performed those gestures and we
analyze them. We saw which gestures have distinguishing features and which gestures do not.
The swipes on the iPhone or the Windows Phone, they do not have distinctive enough
information, but if we make it two fingers or three fingers in different directions or if we make
it like some curves or like pinch, like zoom in and zoom out, that has a distinguishing
information and you can use that for authenticating the person. These are the ten gestures
that I have in my final demo applications. What we do is we ask the user to do the training,
provide at least 30 training samples for each gesture. Thirty is the minimum, but the more the
merrier. We take that information and then we train our classifier. I want to talk about how
we do that. The basic thing behind the whole scheme is almost the same as signatures. Each
finger basically represents a stroke. You divide each finger’s stroke into several sub strokes and
then extract features like velocity, time, bounding boxes, displacement between bounding
boxes. We are not using pressure here because our phone could not give us the pressure
information. That feature is not being used, but I think that if we can use, the API for the
pressure right now is not available, but it will be available soon. I read it somewhere. Once
that is there, once you can use the pressure information, it's going to make it much more
robust, I think, because pressure is something you probably cannot figure out by just watching a
person do a signature. You can see how he does it. You can see all the things, but not the
pressure. So once the pressure information is available we'll use that probably and see how it
improves our accuracy. Again, the same thing, we took those features, trained our classifier,
tested them. As I mentioned, we have data from something like 40 users, so what we do is we
take all of the data from these ten gestures from any user who wants to train. Then we use all
those 40 users as imposters and try to rank which features for this particular user is most
distinguishing, and once we rank them then we send that information -- so we do this in a cloud
service, send that information back to the phone. The phone then asks you automatically
which gesture you should perform and you perform those gestures and then it uploads
information back to the cloud and it does the classification and tells you if that is you were not.
If you can, if you want to be more secure you probably want to do like three gestures or four
gestures or five gestures. If it's just okay, you can do one gesture. Two gestures, three gestures
are actually going to be more secure compared to one gesture. That brings me to the demo
with the phone. This is what you guys can try. I have already done it once. It tells legitimate
and then you guys can try it. Whoever wants to do it, take the phone and do it. Just press the
test button, number of feature selected, three and do that and see if you can imitate that or
not. The phone is there. The last thing I want to talk about is how to do the training. One
interesting thing that we observed while doing this work was I sat down and I did one hundred
signatures as training and I tried to test myself again. I did the one hundred signatures in
training. I trained them. I did all the simulations in Matlab and found out that I'm getting like
99 out of one hundred signatures being identified correctly and same as the case with some of
the volunteers that did it in the same way. They sat down, give me one hundred samples.
Apparently, it looks like it's going to work pretty well. When we actually made the application
and the user, again, try to sign and we give that information, gives that signature to our testing
server, it wouldn't recognize that person, so that was kind of strange because we had one
hundred samples. We are training on 99 samples and testing on one sample, working perfectly
fine. So we train on another 99 samples, testing on the remaining one sample, working very
good. So it was able to recognize them, but why not the signatures that you are getting on
runtime? And we found out that once you are trying to do your training, once you have given
all of your training samples in one row, you cannot develop a pattern which is not your natural
pattern and you cannot imitate that again. If you start doing your signatures one hundred
times, you will actually see that you are doing it in a very distinct way which is very, very
consistent, gives you very good accuracy in simulations, but you cannot imitate that after half
an hour. That becomes hard. So we did it again. What we did again is I asked several users,
including myself, we did training ten samples at a time and with several breaks, half an hour,
hour, days of breaks and once we did that it started working. That's the kind of thing that you
have to do in the training and that's the kind of thing I have seen in some of the earlier work.
They have done this kind of work on phones and some of the tablets as well and they have very
good accuracy of simulation, but I think it's not going to work in real life. You have to do it like
this, don't take too many training samples at once because you develop patterns which are not
your natural pattern. Even, I would suggest, like five samples at a time and then take breaks
and then train over a period of time like maybe a week and then do it and then it works pretty
good. That was about what my work here was. Any questions?
>>: I was also wondering about when you want to use your tablet in the morning and if you
come home from partying probably your fingers will be…
>> Muhammad Shahzad: I was kind of worried when I came here because when I was going to
do my signatures while standing -- I have done all my training and all my testing on my cell
phone always sitting. When I was trying to do it here, I kind of thought that I would not be able
to, but I was able to. If you are doing your signatures, I mean, if you have been doing them for
a while you can do it. That doesn't really make too much of a difference. You always have a
[indiscernible]. In our scheme I did not talk about all of the stuff it does take into account, for
example, if you increase your extra time, so if you do it generally in 4 seconds and this time you
do it in 5 seconds you are lazy. It can still figure out using those 5 seconds as reference, even if
you used 5 seconds, does the rest of the stuff have behavior in it, and it will have, so you
normalize with your total time and all that stuff. It works on different things. But the gestures,
yes. In gestures you do see this problem and we plan to work on it in the future. For example,
if you are driving or you are nervous or something that kind of makes it hard for you to log in.
But if you are in a normal state of mode, like if you are walking it is probably going to be hard.
If you are driving it's going to be hard. So the best way to do it that is train it in all different
scenarios and it will work. Yes?
>>: Is it something that if you add more and more training sessions to it over time it becomes
more and more accurate?
>> Muhammad Shahzad: It does, but that depends on what kind of training you are doing. If
you are just putting in random samples, it's probably going to get worse. If you just think that
you are going to do it again and again and again and don't care about being consistent, then it's
going to make it worse. But otherwise, yes, it makes it better. The training part is tricky. You
have to be very consistent in that and you have to be patient with that because it becomes kind
of tedious and frustrating a little bit. But if you stay consistent in that it works pretty well after
that. Yes?
>>: So you looked at full signatures for authentication. Did you look at anything shorter, like
the first part of your name or some kind of made up gesture? That you are not going to
[indiscernible]
>> Muhammad Shahzad: Made up gestures? No. We didn't ask people to do random -- I think
we did, but we didn't do that analysis. We have a list of gestures right now. We just analyzed
those. Again, the thing is the main idea behind our work was to understand the behavior, to
use the behavior of doing something. If you just make up something right now and if you do it
right now, train it right now and do it afterwards, you probably won't be able to do it again. In
order to get rid of the password, you need to have something that is consistent. Maybe
handwriting, I don't know. I haven't worked on that. Maybe just write something and it can
recognize.
>> Arjmand Samuel: Any more questions? Well let's thank Shahzad for his presentation.
[applause]
Download