>> Arjmand Samuel: It is my pleasure to introduce Shahzad. He is a PhD candidate at Michigan State University. His research interests are primarily security and protocol analysis and behavior-based authentication. Today you will hear about some of the work he has been doing while here at MSR for the summer. Shahzad. >> Muhammad Shahzad: Thank you, Arjmand. I've been working here for like 12 weeks now. This is my last week here. I've been working on behavior-based authentication using signatures and gestures on touchscreen-based devices. Why are we trying to find new authentication schemes? Currently in the computer world we have mostly passwords and pin codes and if somebody else finds out your password or pin code they can access your accounts, so we want to get rid of those. We want to not be dependent on those anymore. The research community has been doing a lot of research on keystroke rhythm, the typing rhythms of people to use those for authenticating the people. That kind of eliminates the use of usernames or passwords, so you enter something. The computer finds out your typing pattern and typing behavior and finds out if it is you or not. But the problem is that it is still not a really efficient way of doing it because it has very high false positive rates still. The thing is that whenever you have to login, and if you want to use your typing rhythms, you have to type something and usually it's considerably long like 300 characters or something like that. That's not very convenient. Anyway, we were thinking can we port some kind of scheme to touchscreen devices. The problem with touchscreen devices is when you try to port such schemes on touchscreen devices is that touchscreen devices do not have a physical keyboard so there is nothing really to guide your fingers physically. In keyboards there are keys so they can kind of guide your fingers so you have a typing or rhythm pattern. Another thing is this kind of scheme does not work for people who are not very adept at typing, so you have to be very experienced and you have to have a pattern for it to work. We were thinking that now that these devices have touchscreens can we use something else. And of course the first thing that we thought about was signatures. Can you use signatures to authenticate yourself on your touchscreen devices? And why not? Signatures are very prevalent these days. They have been forever in financial institutions and the reason being even if you know somebody’s signature it's not easy to copy them. Why don't these banks and all of these financial institutions use passwords? If I know your password I can login or I can access your account. They use signatures because signatures are more secure compared to passwords, but the problem, the reason why passwords is still more prevalent in computing devices is that there was no way to get signatures on your touchscreen devices. But now with these touchscreens and with very fine granularity of resolution that you can attain of the points that you are touching on the screen, I think this thing can be ported, I mean signatures can be made to work on these touchscreen devices and you can probably shift from passwords to signatures. If somebody knows your password they can login, but if somebody knows your signature they cannot login because the way you do your signature is very distinct and that's what we are going to explore today. For signatures, I'm not the first person doing any work on this thing, especially, people have done a lot of work on signatures, especially the static signature analysis scheme. You have an image of the signature. You want to match it to another signature that will go to your database. The problem with this thing is that the static image analysis is that what happens if somebody copies your signature exactly, like he spends some time and he is able to copy your signature exactly so that signature will be authenticated by the static image analysis? The other problem is that these static image analysis techniques which just use image processing to get the authenticated signature require a large amount of training data. Giving you an example, the database MNIST for handwritten collectors, it has 60,000 training samples and people have been doing a lot of work on automatic handwritten digit recognition and the best accuracy that somebody was able to achieve was 99.6 percent on that data set and that is using about 6000 samples per digit. Digits are very easy compared to signatures, so if you need 6000 samples for each digit to get it to train so well that you get 99 percent accuracy, you probably need a lot more than 6000 to train on your signature. You do not want to ask your users to give you 6000 training samples. That's just ridiculous. We were kind of thinking like when you are using touchscreens you are doing signatures and the touchscreen can basically tell you the behavior of the way you like, it can tell you each point what time you entered, what time you touched which parts of the screen. Can we use this information to reduce the training samples and also eliminate the problem of perfect copying of signatures? I'm going to talk about that. Another thing that we talk about will can these signatures be feasible for all kinds of touchscreen devices? I have this one here. Probably it works pretty well on this thing, but can you do it on a phone? You cannot, actually. I will talk about that as well. We wanted to look for something else for phones, again, not pin codes and passwords, not something that you have to remember, but just something that you do on the screen of the phone and then that is what you use for authentication. You guys have seen that in Windows Phone whenever you have to unlock the screen on this phone you have to just make a swipe. You first press the unlock button and then you swipe upward to unlock the phone. On iPhone you rest the unlock button and then you swipe right to unlock the phone. So we were thinking like can we use these gestures to find out if it's the original person or not? That's what we worked on. We kind of make the gestures of little more sophisticated and not just little swipes, but something more sophisticated and we use those to authenticate the user and I will show that that is what we can do. I have the demo applications for both of these things and I will show it to you guys and you will see. The hypothesis behind all of this work is that everybody has a distinct way of doing signatures, so, signatures as well as gestures, so if I can do my signatures, anybody else, even if he can copy it or the gesture that I perform on the phone, even if somebody else can do it, they will not be able to do it in exactly the same way that I do it. That is our basis of the work. That is how we distinguish between people. The rest of the talk is divided into two parts. I will first talk in detail about the signature authentication scheme that we have proposed. I will go into detail, not a lot of technical detail, but I will give details about how this thing works. After that I'm going to switch to the gesture-based authentication on the phones, the techniques that we use behind the gesture-based authentication is kind of a subset of the techniques for signatures, so I'm not going to talk about that technique in detail. I'll just mention what parts of the techniques of the signature-based authentication we borrowed for the gesture-based authentication. Then I will give you some demos and you guys can try that as well. Signatures, what is the objective here? Objective is to find out if the signature is visually correct, if it's structurally correct or not. Does it look the same as the original signature that you are comparing with? And the second thing is once that is done, the second thing is you want to find out if this has been done by the same person or if it has been imitated by some other person who made it look like but it's not the same way that they did the signature. I'm going to show you a couple of videos here just to give you an idea. You will see here two different signatures being done by two different people and -- they are not two different people; they are my signatures. Anyway, you will see four samples of those signatures on each of the screens and you will see they kind of look very similar when you do it, the behavior there, you will see it. These are four samples of the signature from the same person being done and you can see that they kind of look exactly the same when you do it. I have slowed them down a bit for you guys to see it. All four are different signatures, but they look kind of the same when you are actually doing it. I'm going to show you another sample here. You will also see that there are some differences, sometimes a little lag in some particular part of the strokes, but there is still a behavior with which a person does his signatures. We will use this fact for authenticating the signatures. From this particular thing we will extract velocities in different parts of the signatures. From velocity I mean the magnitudes of the velocity as well as the directions. I will talk about this in detail later. The next thing that we should see when we analyze this thing, does the person have a behavior, when the pressure when he does the signature, the pressure on the screen. I don't know if that's very clear or not but you can see that there are some parts, the same parts that have less pressure, so the darker the dot is the higher is the pressure. There are these dark parts and there are lighter parts so they are always the same in the signature. This probably will make it more clear. These are the pressure parts of these four samples and you can see that you have very distinct peaks at different locations. This is one here and then there are two here, two here and two here. And then there are these at the start. There are distinct peaks. This is, again, the behavior and the signature of the person. You can use this information for authentication. This is another sample and this is another pressure part of the thing. You can see that these have the same distinct peaks here, here, here and so you can extract this information out and try to use that for authentication. And another person who imitates your signature exactly the same we will not be able to follow this pressure plot and so velocity is one thing that you can use. Sure? >>: Doesn't this pressure plot kind of follow from the shape of your… >> Muhammad Shahzad: No. Maybe I should have done that. If you do your signature, try to do it a little differently and then, maybe a little slowly or something. Your pressure plot will automatically change. Pressure is not based upon the shape. Pressure is how much pressure you are putting. So if you have a behavior you will always press hard, press little, pull up your pointer finger or whatever you are using and then that kind of makes it distinct. Right now I talked about velocities that you can use, the magnitude of velocities, the directions and the pressure. The magnitudes of the velocities kind of give you the behavioral information. The pressure will also give you the behavioral information. The directions of the velocity give you the structural information. You use the directions to find out if the structure of the signature is correct, like if it's really correct or not. You use the magnitudes of the velocities and the pressures to find out if the behavior of doing the signature is correct or not. There are other features as well that I will talk about. Let's see how to extract these features, because the thing is that you don't really know which exact -- there are parts of signatures which do not have consistent behavior among different samples, so you don't want to use any information from that particular part of the signature because it is going to increase false positives and imposters will still be able to login. You want to not use that information and use only that information from those particular parts of signatures that have consistent behaviors. So how do we extract those features? And easy way is to provide the whole signature into several sub strokes. Each signature has several strokes. This particular signature has eight strokes. This is one, one, two, three, four, five, six, seven and eight. You have eight strokes. You divide each stroke into several small parts and then the thing is that you also have to figure out how small should this sub strokes be. If it's too small it will kind of become instantaneous and there won't be any consistency among different samples. If you make it too large it will average out all the interesting information and you, again, won't be able to find out if there is any distinguishing information in there. You have to have a particular correct size for the sub strokes. There are two challenges. The challenge is first, you have to find out what parts of the signatures have consistent behavior and in those parts what should be your sub stroke size which will actually extract a consistent behavior. If the sub stroke again is too large it averages out too small. It is instantaneous and you don't get anything out of it. To find the consistent features you divide the sub strokes, divide each stroke into sub strokes of certain time T. We use T equals 20 milliseconds, 30 milliseconds, 40, 50, so you divide each stroke into several sub strokes of different time periods and then take one particular stroke, like maybe this one black here. Take this one particular stroke from all the signatures that you have and then just extract the average feature value from this particular sub stroke. For example, from this particular sub stroke you can, if you want to find the pressure, take the pressure values among all the points in this upstroke and take the average from the other sub strokes from all other signatures, take the pressure values, take the average. If you have, for example, a hundred training samples, so you have a hundred of these sub strokes and you have a hundred average values of that particular feature that you are using. It could be pressure. It could be velocity. It could be direction. Once you get the hundred values of those signatures, of those sub stroke features from all the signatures, you take the mean and you take the standard deviation. Then try to find out if the standard deviation is large or small. If a standard deviation is large, that means this feature is not consistent among different samples of the same signature. If that standard deviation is small, then you can probably use it because the user has the same behavior every time. So we use coefficient of variation which is just simply the ratio of standard deviation to mean. We use coefficient variation of 0.1 and 0.2. 0.1 means the standard deviation is just plus or minus 10 of your mean value which is very restricted. The reason why we are using such small value of coefficient variation is because it is okay to reject the legitimate user sometimes to not be able to login, but it's not okay to let an imposter be able to login even once. So you use small value of coefficient variation. This will increase your false negatives, kind of might become annoying, but it will never let an imposter get logged in because you are allowing such a small amount of deviation in the feature values that an imposter may not be able to do it. We take the coefficient variation of each sub stroke in the whole signature and get the feature values and see if the coefficient variation is below this threshold. For all the feature values where you have the coefficient variation below this threshold, you use those feature values for training of your classifier. If a certain feature has a coefficient variation greater than this threshold, you just don't use it. You ignore that one. Using this strategy this is what we get. Let's talk about that action. This is the entire signature. It is divided into several parts and from each part we calculate the value of direction and then take the coefficient variation from each sub stroke and then you see these particular areas, the green ones, are always consistent. They have a coefficient variation of, the value of direction is always less than 0.1. You extract features out of these and you ignore the rest of the stuff. Whenever somebody will try to log in, he will provide your signature. You will extract the values of direction from only these parts of the signatures and then you will try to compare it, because these are very consistent. They are always the same for the legitimate user. If a match, then it is probably the legitimate user. But these are not the only things that we use. If you increase the coefficient variation to 0.2 -0.1 was this. 0.2 means you can use a little bit more information out of it. 0.2 is still acceptable. It's not too bad. This is the other signature. These are the areas from where you use the values of direction. If you increase the coefficient variation to 0.2 you extract values from a few more locations of the signature. Velocities, the magnitude of velocities, if you use coefficient variation of 0.1 this particular user does not have enough consistency so you know we didn't use any values here. If you go to 0.2 you have quite a few places from where you can extract the magnitudes of the velocity. Same is the case with this user, coefficient variation of 0.1, no velocities extracted. 0.2, yes, you have enough consistency that coefficient variation is less than general 0.2 and you use these velocity values. If somebody finds out even for a coefficient variation of 0.2 you do not have any feature values that you can extract, like nothing is consistent, you can increase, keep on increasing the coefficient variation so it will let the legitimate user login eventually, but again, if you use high coefficient variation there is a good chance that you will let an imposter login as well. Although users that I have studied, all the ones that we had we were able to do it with 0.1 as well as 0.2, so we didn't have to go above 0.2. I think generally 0.2 gives you enough features. From the pressure, we saw the pressure was kind of very consistent among different signatures. We use a lot of the parts from the signature for the pressure, from both of these signatures actually. There are other features that you can use. For example, total time of the signature. For this particular user the total time was generally 3.9 seconds and it varies from about 3.7 to 4 seconds. For the second user it varied from 2.8 to 3.2 seconds, so it's not too much variation. You can probably use total time of the signature as well. We are not really using the total time of the signature. We are actually using time of each stroke. That gives you enough, that is consistent enough. These are the other features that we use. The duration of each stroke in the signature, this gives you behavioral information. Inter-arrival time of strokes. That is the time between two strokes, like how much time it takes you to lift your pen or finger from one stroke and start the other one. We use that. And then we also use displacement between bounding boxes of the strokes. I will talk about what bounding boxes are. You extract all these features from, you take all the stroke time, you take all the inter-stroke time, you take all the displacements between consecutive bounding boxes and you, again, run the same test of coefficient variations are .1 or 0.2. If it's less than that you use that feature. Otherwise you don't. This is one signature of one of the users. We can see that inter-stroke like the time between two strokes and time of strokes, the coefficient variation is reasonable, less than 0.1 in most guesses there and less than 0.2 in most cases there. Sometimes in some signatures some stroke times do not have coefficient variation of less than there .2. We just do not use those stroke times. This is what I meant by bounding boxes. Bounding box of each stroke means the smallest rectangle, rectangle with smallest area that bounds that stroke, so you take that box then basically join the centers of all of the consecutive boxes. You can see that each user, so all of these four signatures have kind of the same pattern. The location at which a particular stroke is located compared to the other strokes is kind of usually the same. This previous thing showed the coefficient variation of direction and the distance. The distance means this distance and direction means this direction. So the coefficient variation of direction is always very less, so that is a very good feature to use, so we use that feature. And in some cases you can also use the coefficient variation of the distance between bounding boxes, so that's also usable. Now I have talked about what features we use that include velocity, the magnitude of the velocity, the direction, the pressure, stroke times, inter-stroke times and displacement between bounding boxes. These are the features that we use to identify a signature from the other signatures. We train a classifier. I will talk about that and then we do the testing and that's how it works. Before I do that, there's one last thing I want to talk about is this. I'm not going to go into the exact details. If somebody is interested we can take it off-line. Sometimes when you are doing this, this is kind of the signature that generally looks like from a user. It has eight strokes here. You can see that these two signatures have seven strokes. This particular stroke is combined and these two strokes, they are combined. We came up with a way to basically separate them. If you have seven strokes there is no way you can make the sub strokes and then correlate and try to find good sub strokes that you could extract features from. So you first need to make the number of strokes in the signature consistent and then after that you extract the features. Generally, what effectively we do is we take information from all the strokes of the training data and when a problematic signature comes in we take all the strokes and we try to find out which stroke you should split, which stroke is not consistent with the training data. Once you find that out then we, again, use the training data and try to find out which exact parts of this stroke we should split from and then you would remove that certain part from the stroke and you get your desired number of strokes, like this. For this one, we just removed a little bit of the thing from here and it divides into two for this one. We remove this part and it makes the number of strokes consistent. All right. So that is all of the feature extraction part. After that we take the Support Vector Distribution Estimation. It's a one class estimation so you do not have any information from the other imposter’s class. All you do is open implementation of SVM. One class SVM is called Support Vector Distribution Estimation. That's available online so we use libSVM for that. We took these features, trained our classifiers, did search for parameter optimizations. I'm not going to talk about that because that is standard machine learning part. We did that and it works up pretty well. I'm going to give you a demo of this thing now. After the presentation anybody is welcome to try to do these signatures and try to log in. I'm not sure if I'll be able to do it here or not, but generally, I have, so please trust me. Let's see. It tells you here that it's a legitimate user. These are done by me. I'm going to do it again. I will make them look exactly like this thing. I cannot make it look exactly like, but I will make them look similar enough that if a person visually sees them he will say that this is from the same person. I will try to do it differently and then it will tell you that it's not the same person. These are kind of hard, but if you want to try it you are welcome to. Does it matter if the user gets visual feedback from… >> Muhammad Shahzad: The reason that I'm not getting, yes. It matters. If you have visual feedback, we have not done that, but I think it's going to be much more convenient for the user. The reason I'm not doing that is because I need to have very good sampling rate and if I try to draw the thing on the screen is kind of reduces my sampling data. When it reduces my sampling data it reduces my accuracy. See this? This one is imposter. This one is legitimate, and they look exactly the same to the naked eye, right? This was, I mean there will always be differences. You can never do your two signatures exactly alike. If I signed it at the bank probably they would accept it. This is the idea behind this thing. That was the signatures part. Now let's move to the gestures. Why don't we just use signatures on the phone? How hard could it be? This hard. This was the best that I could get out of my signatures. I do both of these. This is what it looks like on the phone. Maybe, maybe you can have some accuracy here as well. We didn't try that, but I don't think it's going to be very good because it's hard to imitate your own signatures on the phone. The screen is small. You don't have any place to rest your hand and it makes it hard to do your own signatures on the screen. Again, if you are driving you probably do not want to do your signatures to unlock your screen, right? There should be an easier way for the phones. What do we mean by gestures? These are the gestures that we have. We have 10 gestures. I started with 39 gestures. I gave a few phones to some volunteers. We've got like 40 volunteers. They performed those gestures and we analyze them. We saw which gestures have distinguishing features and which gestures do not. The swipes on the iPhone or the Windows Phone, they do not have distinctive enough information, but if we make it two fingers or three fingers in different directions or if we make it like some curves or like pinch, like zoom in and zoom out, that has a distinguishing information and you can use that for authenticating the person. These are the ten gestures that I have in my final demo applications. What we do is we ask the user to do the training, provide at least 30 training samples for each gesture. Thirty is the minimum, but the more the merrier. We take that information and then we train our classifier. I want to talk about how we do that. The basic thing behind the whole scheme is almost the same as signatures. Each finger basically represents a stroke. You divide each finger’s stroke into several sub strokes and then extract features like velocity, time, bounding boxes, displacement between bounding boxes. We are not using pressure here because our phone could not give us the pressure information. That feature is not being used, but I think that if we can use, the API for the pressure right now is not available, but it will be available soon. I read it somewhere. Once that is there, once you can use the pressure information, it's going to make it much more robust, I think, because pressure is something you probably cannot figure out by just watching a person do a signature. You can see how he does it. You can see all the things, but not the pressure. So once the pressure information is available we'll use that probably and see how it improves our accuracy. Again, the same thing, we took those features, trained our classifier, tested them. As I mentioned, we have data from something like 40 users, so what we do is we take all of the data from these ten gestures from any user who wants to train. Then we use all those 40 users as imposters and try to rank which features for this particular user is most distinguishing, and once we rank them then we send that information -- so we do this in a cloud service, send that information back to the phone. The phone then asks you automatically which gesture you should perform and you perform those gestures and then it uploads information back to the cloud and it does the classification and tells you if that is you were not. If you can, if you want to be more secure you probably want to do like three gestures or four gestures or five gestures. If it's just okay, you can do one gesture. Two gestures, three gestures are actually going to be more secure compared to one gesture. That brings me to the demo with the phone. This is what you guys can try. I have already done it once. It tells legitimate and then you guys can try it. Whoever wants to do it, take the phone and do it. Just press the test button, number of feature selected, three and do that and see if you can imitate that or not. The phone is there. The last thing I want to talk about is how to do the training. One interesting thing that we observed while doing this work was I sat down and I did one hundred signatures as training and I tried to test myself again. I did the one hundred signatures in training. I trained them. I did all the simulations in Matlab and found out that I'm getting like 99 out of one hundred signatures being identified correctly and same as the case with some of the volunteers that did it in the same way. They sat down, give me one hundred samples. Apparently, it looks like it's going to work pretty well. When we actually made the application and the user, again, try to sign and we give that information, gives that signature to our testing server, it wouldn't recognize that person, so that was kind of strange because we had one hundred samples. We are training on 99 samples and testing on one sample, working perfectly fine. So we train on another 99 samples, testing on the remaining one sample, working very good. So it was able to recognize them, but why not the signatures that you are getting on runtime? And we found out that once you are trying to do your training, once you have given all of your training samples in one row, you cannot develop a pattern which is not your natural pattern and you cannot imitate that again. If you start doing your signatures one hundred times, you will actually see that you are doing it in a very distinct way which is very, very consistent, gives you very good accuracy in simulations, but you cannot imitate that after half an hour. That becomes hard. So we did it again. What we did again is I asked several users, including myself, we did training ten samples at a time and with several breaks, half an hour, hour, days of breaks and once we did that it started working. That's the kind of thing that you have to do in the training and that's the kind of thing I have seen in some of the earlier work. They have done this kind of work on phones and some of the tablets as well and they have very good accuracy of simulation, but I think it's not going to work in real life. You have to do it like this, don't take too many training samples at once because you develop patterns which are not your natural pattern. Even, I would suggest, like five samples at a time and then take breaks and then train over a period of time like maybe a week and then do it and then it works pretty good. That was about what my work here was. Any questions? >>: I was also wondering about when you want to use your tablet in the morning and if you come home from partying probably your fingers will be… >> Muhammad Shahzad: I was kind of worried when I came here because when I was going to do my signatures while standing -- I have done all my training and all my testing on my cell phone always sitting. When I was trying to do it here, I kind of thought that I would not be able to, but I was able to. If you are doing your signatures, I mean, if you have been doing them for a while you can do it. That doesn't really make too much of a difference. You always have a [indiscernible]. In our scheme I did not talk about all of the stuff it does take into account, for example, if you increase your extra time, so if you do it generally in 4 seconds and this time you do it in 5 seconds you are lazy. It can still figure out using those 5 seconds as reference, even if you used 5 seconds, does the rest of the stuff have behavior in it, and it will have, so you normalize with your total time and all that stuff. It works on different things. But the gestures, yes. In gestures you do see this problem and we plan to work on it in the future. For example, if you are driving or you are nervous or something that kind of makes it hard for you to log in. But if you are in a normal state of mode, like if you are walking it is probably going to be hard. If you are driving it's going to be hard. So the best way to do it that is train it in all different scenarios and it will work. Yes? >>: Is it something that if you add more and more training sessions to it over time it becomes more and more accurate? >> Muhammad Shahzad: It does, but that depends on what kind of training you are doing. If you are just putting in random samples, it's probably going to get worse. If you just think that you are going to do it again and again and again and don't care about being consistent, then it's going to make it worse. But otherwise, yes, it makes it better. The training part is tricky. You have to be very consistent in that and you have to be patient with that because it becomes kind of tedious and frustrating a little bit. But if you stay consistent in that it works pretty well after that. Yes? >>: So you looked at full signatures for authentication. Did you look at anything shorter, like the first part of your name or some kind of made up gesture? That you are not going to [indiscernible] >> Muhammad Shahzad: Made up gestures? No. We didn't ask people to do random -- I think we did, but we didn't do that analysis. We have a list of gestures right now. We just analyzed those. Again, the thing is the main idea behind our work was to understand the behavior, to use the behavior of doing something. If you just make up something right now and if you do it right now, train it right now and do it afterwards, you probably won't be able to do it again. In order to get rid of the password, you need to have something that is consistent. Maybe handwriting, I don't know. I haven't worked on that. Maybe just write something and it can recognize. >> Arjmand Samuel: Any more questions? Well let's thank Shahzad for his presentation. [applause]