International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 11, November 2013 ISSN 2319 - 4847 Probabilistic Neural Network for User Authentication Based on Keystroke Dynamics Sarab M. Hameed1, Mais M. Hobi2 and Sumaya Saad3 1,2,3 Computer Science Department, Baghdad University, Baghdad, Iraq Abstract Computer systems and networks are increasingly used for many types of applications; as a result the security threats to computers and networks have also increased significantly. Traditionally, password user authentication is widely used to authenticate legitimate user, but this method has many loopholes such as password sharing, brute force attack, dictionary attack and more. The aim of this paper is to improve the password authentication method using Probabilistic Neural Networks (PNNs) with three types of distance include Euclidean Distance, Manhattan Distance and Euclidean Squared Distance and four features of keystroke dynamics including Dwell Time (DT), Flight Time (FT), mixture of (DT) and (FT), and finally Up-Up Time (UUT). The results illustrate that Euclidean Squared Distance with (UUT) feature provide low error rate and high accuracy compared with the other two types of distances used. Keywords: Biometrics, Keystroke Dynamics, Probabilistic Neural Network, User Authentication. 1. INTRODUCTION Internet applications use an authentication scheme to make sure that only genuine individual can login to the application [1]-[2]. User authentication is the process of validating claimed identity for the purpose of performing trusted communications between parties for computing application [3].Biometrics offers an automated methods for authentication and identification based on physiological or behavioral characteristics [4]-[5]. The keystroke dynamics authentication is based on the idea that each user has unique keystroke latency pattern which is different from others [6]. There are two types of keystroke dynamics: first one is the static keystroke dynamics in which the typed data is fixed and the typed time information is also fixed during login time. While the second one is continuous keystroke dynamics in which individuals are authenticated independently of what they are typing on the keyboard and the typing characteristics are analyzed during complete session [7]. This paper is organized as follows. Section two presents related work. Section 3 illustrates the features of keystroke dynamics. Section 4 presents the proposed approach; section 5 describes the proposed approach for user authentication, while the evaluation criteria explained in section 6. Finally the experiment results, results analysis and conclusions are given in sections 7, 8, and 9 respectively. 2. RELATED WORK There are several previous works that may have a relation in one way or another to the present work. -In[8], a comparison between ADALINE(Adaptive Linear Element) based on the single perceptron model and the BPNN(Back Propagation Neural Networks) model, using both the FT and digraph latency time, concluded that the BPNN surpass the ADALINE which was not capable of classifying patterns. -In [9], a statistical method was used to classify users to legitimate users or impostors. The extracted FT feature was used and 63 users participated in their experiment. -In [10], a statistical-based comprehensive study was carried out and presented the development of a keystroke dynamicsbased user authentication system using the neural network with FT feature. The deduction was that neural network-based methods have better results as compared with statistical methods in keystroke patterns classification. -In [11], user’s typing biometrics was measured using a fuzzy logic. This experiment involved 29 users who provided their user ID (Identification) and password of length eight or more. Common DT were used and the typing difficulty between two successive keystrokes is calculated. One main disadvantage of using typing difficulty as keystroke feature was to set the categories of difficulty, as there are wide ranges of possibilities to define the typing difficulty. 3. KEYSTROKE DYNAMICS FEATURES The main purpose of the feature extraction phase is extracting vital features from the timestamp collected from raw keystroke data for template generation. The extracted features are certainly the critical means with regards to keystroke dynamics based biometrics and directly influence the performance of the classifier [11]. Extracting the right features from a dataset can reduce the computational complexity of the problem [14].Each sample in the dataset is represented by a sequence of timing information expressing the exact time at which keys have been pressed and released. There are many common types of features which can be extracted from a human keystroke as follows [11]-[12]:1- Dwell time: It is the time to measure how long a key is pressed (Down) until it is released (Up). Volume 2, Issue 11, November 2013 Page 342 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 11, November 2013 ISSN 2319 - 4847 2- Flight time: It is the Interval between a key release (Up) and next key press (Down) time. 3- Down-Down: It is the Interval between two successive key presses (Down). 4- Up-Up: It is the Interval between two successive key releases. 5- Tri-graph: Is the elapsed time between the first key press (Down) and the third key press (Down). 6- Placement of the fingers: a camera is required in this type. 7- Pressure of keystroke: In this case a special type of pressure which has a sensitive keyboard needs to be used. 4. PROBABILISTIC NEURAL NETWORKS The PNN consists of input layer, pattern layer, summation layer, and output layer. The number of nodes in input layer depends on length of timing information vector for specific user. The pattern layer is designed to contain one neuron (node) for each training sample available and the neurons are split into the two classes. Each neuron in the pattern layer computes a distance measure between the presented input vector and the training example represented by that pattern neuron. The summation layer contains one neuron for each class, while the output layer contains one neuron. PNN algorithm is started with read timing information vector (X) and feed it to each Gaussian function in each class, then for each group of hidden nodes, compute all Gaussian functional values at the hidden nodes as illustrated in equation(2)[17]. p i ( x ) D 2 exp 2 (2) Where pi : represents the output of a pattern node. x : is the timing information vector to be assigned into class ci . : smoothing factor. D : represents the distance between the timing information vector x and the sample vector (reference vector) y is computed in this paper using three types of distance measures as illustrated in the following equations [15]-[16]: 1-Euclidean Distance: is one of the most popular distance metric between two vectors x and y and is computed as in equation (3). D ( x , y ) x y 2 (3) 2-Euclidean Squared Distance: uses the same equation as the Euclidean distance metric, but does not take the square root. 3-Manhattan Distance: the Manhattan (or city block) distance between vector x and vector y is calculated as in equation (4). D (x, y) y x (4) After these steps, for each group of hidden nodes, feed all its Gaussian functional values to the single output node for that group as illustrated in equation(5). y i 1 ( x) n n j j i1 p i ( x) (5) Where yi nj : represents the output of summation node i for classi : represents the number of samples in pattern layer of classi pi : represents the output of pattern node i Finally, find maximum value of all summed functional values at the output nodes comparing the values of Y1(x) and Y2(x). If Y1(x) > Y2(x), then is assigned to class1;otherwise class2. Volume 2, Issue 11, November 2013 Page 343 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 11, November 2013 ISSN 2319 - 4847 5. PROPOSED APPROACH FOR USER AUTHENTICATION The aim of this section is to apply PNN for keystroke dynamics. This section, also, clarifies the attributes that can be extracted from the users based on their typing styles that can be employed to maximize subtle differences between the typing styles of users. 5.1Keystroke Dataset Collection Phase An essential part of any keystroke dynamics system is the acquisition of users keystroke data from the keyboard [13]. Therefore the keyboard property is set to enable the proposed system to distinguish between authentic users and impostors in an accurate way. The dataset acquired is a collection of timing of keystroke for specific passwords over a period of three months. The users are randomly selected from the staff of Baghdad University/college of science to participate building the required datasets. Two datasets are created. The first one is named Keystroke-1and the second one is Keystrok-2. In Keystroke-1, 17 users are asked to type the password "computer" twenty five times. Five times (i.e. five samples) per week's session, i.e. five weeks are required to collect the dataset for each user. The user types the password in different sessions because there is a chance that when the user types continuously the same password again and again, the typing speed may be increased. Further, to take the probability of all user variations and circumstances. However, the password “computer” is considered weak because of the relative ease with which a third party can guess them or find them via dictionary attacks. Hence, another dataset, Keystroke-2, is built up in a similar way to Keystroke-1, but 16 users were participated (thirteen of them are similar to those who participated keystroke-1). Here the users were asked to type password “comp.84-rl” which satisfies the strong password selection criteria, i.e., at 10 characters in length combining symbols and numbers. 5.2 Feature Extraction phase In this paper, DT, FT, and UUT features are extracted from the collected dataset. Figure (1) depicts the three major features that have been extracted when the users type the "computer" password. The same representation can be repeated with the second password”comp.84-rl”. Figure 1 Features Representation of "computer" Password The length of the timing information vector is different and depends on the length of the password, for example, a password “computer” which contains eight characters will result in eight DT, seven FT and seven UUT. Generally, a password with n character will yield n number of DT and n - 1 number for FT and UUT. 5.3 Preprocessing To enable the proposed system to distinguish between authentic user and imposter with minimum error rate, a normalization process is performed as shown in equation (1) F F MinF MaxF MinF (1) where F: is the feature value, MinF: is the minimum value that the feature F can get, and MaxF: is the maximum value that the feature F can get. 5.4 Training Phase This section illustrates how PNN was used in proposed system..Table (1) shows the number of nodes in each layer when the features of "computer" password are extracted. The first row represents the information of PNN structure when DT feature is used. The number of input layer nodes represents the length of information timing vector according to specific feature. The number of pattern layer nodes represents the total number of training samples (users). Volume 2, Issue 11, November 2013 Page 344 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 11, November 2013 ISSN 2319 - 4847 Table 1 : PNN Information Structures for "computer" Password Feature(s) Input layer Pattern layer Summation layer Output layer Nodes Nodes Nodes Nodes DT 8 17 2 1 FT 7 17 2 1 DT+FT 15 17 2 1 UUT 7 17 2 1 In this paper, there are two classes, the first class for authorized users and the second one for impostors. For example, in Keystroke-1 there are 17 users, the first 6 users are authorized and the remaining 11 users are impostors when the "computer" password is used. On the other hand, in Keystroke-2 there are 16 users the first 6 users are authorized and the remaining 10 users as impostors. The main aim of training phase of PNN is to find a proper value of smoothing parameter . In this paper, an algorithm is proposed to determine the proper range for as clarified in algorithm (1). After the proper range for is obtained, the training samples in pattern layer nodes are presented as input information vector in input layer. Then change value within proper range and apply PNNalgorithm continuously until reaching to least error rates in classification. This value of is considered the best value for good classification . Algorithm (1): Smoothing Parameter Range Determination Input: Set of training timing information vectors p i , j . Output: Smoothing parameter . Step1: [Find Mean Vector] Compute the mean (centeroid) vector for each class c k , 1<= k<=2. k 1 Nk p i, j . Where N : is the total number of patterns with specific class pi, j j i : is the feature number of pattern number Step2: [Find Standard Deviation Vector] Compute the standard deviation vector for each class, k. std k 1 Nk 2 i pi, j . where i : is the ith value of mean vector Step3: [Find Range] Find the minimum and maximum standard deviation value for each class to obtain the proper range of value. 5.5 Authentication Phase Authentication process consists of verifying if the input of the user corresponds to the claimed identity. The way of capturing these inputs greatly depends on the kind of the keystroke dynamics used (i.e., in this paper for example the user must type the password). The features are extracted from the raw biometric sample and are compared to the model of the claimed user. By this way, two authentication factors are used namely knowledge based, which is the password of the user and biometric based, which is the way of typing the password. As mentioned previously in PNN training section the optimum value of smoothing parameter is obtained in order to use this value in PNN authentication phase. The user is asked to type the password. When the user types the correct password, the PNN authentication begins to check whether the user is authentic or not. The timing information vector is extracted. Then the timing vector is presented to the input layer nodes. At pattern layer, the similarity score between a reference template and a test data template is computed. The Volume 2, Issue 11, November 2013 Page 345 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 11, November 2013 ISSN 2319 - 4847 output score of pattern layer node has the range of [0 1]. Then this output score is put forward into summation layer. Finally, the decision is made at output layer to classify the user either as authentic or imposter. 6. EVALUATION CRITERIA To evaluate the predictive performance of the proposed biometric authentication system four measures are calculated. These measures are called False Reject Rate (FRR), False Accept Rate (FAR), Mean Error Rate (MER) and Accuracy. The formulas for calculating each of these measures are given as in equations (6), (7), (8) and (9) respectively [12]. FRR: is defined as the rate at which users are rejected when they could be authenticated. FRR Number of rejected ligitimate users Total number of ligitimate attempts (6) FAR: is defined as the rate at which users are accepted when they should be rejected. FAR Number of accepted impostor attempts Total number of impostor attempts (7) MER: is the mean rate of FAR and FRR. MER FAR FRR 2 (8) Accuracy: is defined as the proportion of true results in the population. Accuracy Number of correct attempts Total number of attempts (9) 7. EXPERIMENTAL RESULTS Two experiments are conducted independently. In each experiment the two passwords “computer” and “comp.4-rl” are used by PNN. In the first experiment was tested on the same sample in the training dataset i.e. Keystrok-1 and Keystrok2. Each sample contains mean values of keystroke timing vectors. The results of this experiment were obtained with error rates equal to 0% and Accuracy 100% of two passwords "computer" and "comp.84-rl" with three types of distances, as shown in tables (2-3). The second experiment deals with testing the proposed approach on line. This experiment includes computing the selected feature(s) for each key he/she typed. Then the preprocessing on the computed vector time is applied. Moreover, this experiment involves the testing when the proposed approach uses the first trial of password typing, and when it uses three trials of password typing. The results of this experiment are shown in table3( 4-5). “computer” Manhattan Euclidian Squared Euclidian Distance Password Table 2: Experiment1 Results of “computer” Password Metrics DT FT DT+F UUT T FRR% 0 0 0 0 FAR% MER% Accuracy% 0 0 100 0 0 100 0 0 100 0 0 100 FRR% 0 0 0 0 FAR% MER% Accuracy% 0 0 100 0 0 100 0 0 100 0 0 100 FRR% 0 0 0 0 FAR% MER% Accuracy% 0 0 100 0 0 100 0 0 100 0 0 100 Table 3: Experiment1 Results of “comp.84-rl” Password Volume 2, Issue 11, November 2013 Page 346 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 11, November 2013 ISSN 2319 - 4847 DT FT DT +FT UU T FRR% FAR% MER% Accuracy % FRR% FAR% MER% Accuracy % FRR% FAR% MER% Accuracy % 0 0 0 100 0 0 0 100 0 0 0 100 0 0 0 100 0 0 0 100 0 0 0 100 0 0 0 100 0 0 0 100 0 0 0 100 0 0 0 100 0 0 0 100 0 0 0 100 “comp.84-rl” Manhattan Euclidian Squared Euclidian Distance Password Metrics 8. RESULTS ANALYSIS The PNN is used as keystroke dynamics authentication. The results of using PNN with different types of distances show the ability to distinguish authentic users from impostors. The results of experiment1 reflect a good training level in order to obtain high performance of proposed approach..The results of experiment2 give the indication to that the DT feature is the worst keystroke dynamic feature. On other hand experiment2 results indicate that UUT producethe best or equal results as compared with other features that are used in previous works DT, FT, and combination of DT and FT. Finally, the best results are obtained with UUT out performs others because UUT implicitly contains the two other features DT, and FT; that leads to build a new feature from the previous two features making the last feature having more capability to discriminate the authentic users from the impostors. Furthermore, this best result is obtained with fewer network net nodes when compared with combination of DT and FT features.The FAR%, FRR%, MER% and Accuracy of proposed approach evidence that it is possible to improve the password security mechanism considering not just the combination of DT and FT but also UUT feature. 9. CONCLUSIONS From the results, one can conclude, in general, the following: 1-The DT, FT, and UUT features are candidate for distinguishing between authentic users and imposter one and may be exploited to reinforce password security. 2-The keystroke dynamics features namely DT, FT, combination of DT and FT and UUT are extracted. However UUT feature satisfies the best results among the others. 3-The accuracy of the presented work with PNN is close to meet acceptable error levels that would be required for a system with some degree of security. 4-When PNN with three types of distance metric (Euclidean distance, Euclidean squared distance, and Manhattan distance) is applied the results show that the Euclidean squared distance produced the best results. Volume 2, Issue 11, November 2013 Trail no. Squared Euclidean Distance Euclide an “computer" Password Table 4 Experiment2 Results of “computer” Password 1 Metrics DT FT DT +F T UU T FRR% FAR% MER% Accuracy % FRR% FAR% MER% 50 9 29 16 0 8 16 0 8 16 0 8 76 94 94 94 16 27 21 0 0 0 0 0 0 0 0 0 Page 347 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Squared Euclidean Euclidean Manhattan Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 11, November 2013 ISSN 2319 - 4847 Manhattan 3 Accuracy % FRR% FAR% MER% Accuracy % FRR% FAR% MER% Accuracy % FRR% FAR% MER% Accuracy % FRR% FAR% MER% Accuracy % 76 100 100 100 50 9 29 33 0 16 33 0 16 16 0 8 76 88 88 93 61 9 35 38 0 19 27 0 13 27 0 13 72 86 90 90 33 24 28 27 3 15 27 0 13 16 0 8 72 88 90 94 61 9 35 50 0 25 50 0 25 33 0 16 72 82 82 88 Trail no. Password Distance Table 5 Experiment2 Results of “comp.84-rl” Password Metrics DT FT DT UU +F T T “comp.84-rl" Manhatt Squared an Euclidean Euclidean Manhattan Squared Euclidean Euclidean 1 Volume 2, Issue 11, November 2013 3 FRR% FAR% MER% Accuracy % FRR% FAR% MER% Accuracy % FRR% FAR% MER% Accuracy % FRR% FAR% MER% Accuracy % FRR% FAR% MER% Accuracy % FRR% FAR% MER% 50 30 40 62 0 10 5 93 16 0 8 93 0 0 0 100 50 30 40 62 0 10 5 93 16 0 8 93 0 0 0 100 33 20 26 75 0 20 10 87 16 10 13 87 0 0 0 100 55 30 42 60 5 10 7 91 5 10 7 91 11 3 7 93 66 23 45 60 11 10 10 89 5 13 9 89 11 3 7 93 55 26 41 11 13 12 11 10 10 16 3 10 Page 348 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 11, November 2013 ISSN 2319 - 4847 Accuracy % 62 87 89 91 REFERENCES [1] U. Dieckman N and R.W. Frischholz “BioID: A Multimodal Biometric Identification Systems” IEEE Computer, Vol.33, No. 2,pp.64-68, 2000 . [2] F. Monrose and A. D., Rubin “Keystroke Dynamics as a Biometric for Authentication” Future Generation Computing Systems (FGCS), Vol.12, No 12, pp.351-359, 2000. [3] L. O’Gorman Comparing Passwords, Tokens, and Biometrics for User Authentication, Proccedings of the IEEE,Vol.91,No.12, pp.2019-2040, 2003. [4] Y. Chen and A. jain, “Beyond minutiae: A fingerprint individuality model with pattern, ridge and pore features,” in ICB09,2009. [5] H. Mendez, C.Martin, J. Kittler,Y. Plasencia, and E. Garcia Reyes, “Face recognition with lwirimagery using local binary patterns,” in Proceedings of the International Conference on Advances in Biometrics, 2009. [6] L. C. F Araujo,H. R. LuizSucupirajr., Miguel G. Lizarrage, Lee L. Ling, and Joao B. T. Yabu-Uti, “User Authentication Through Typing Biometrics Features”, IEEE Transactions on Signal Processing, Vol. 53,No.2, pp.851-855, 2005. [7] D. Davis and W. Price, “Security for Computer Networks”, John Wiley & Sons, Inc., 1989. [8] N. Abdullah, A.M. Ahmad, “User Authentication via Neural Networks”, in Proceedings of the 9th International Conference on Artificial Intelligence: Methodology, Systems, and Applications, pp. 310-320, 2000. [9] D. C. D. Souza, “Typing Dynamics Biometric Authentication”, Bachelor engineering thesis, Faculty of Engineering and Physical Sciences, University of Queensland, Australia, 2002. [10] L. C. Change., L. W. Kin, and L. C. Peng, "Keystroke Patterns Classification Using the ARTMAP-FD Neural Network", In proceeding of 3rd International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIHMSP, Vol. 2, pp. 61-64, 2007. [11] P.S. Tee.,T.S. Ong. andA. B. J. Teoh, “A Multilayer Layer Fusion Approach on Keystroke Dynamics”, Springer, Vol. 14, pp. 23-36, 2011. [12] H. Barghouthi, “Keystroke Dynamics How Typing Characteristics Differ from One Application to Another”, Msc in Information Security, Gjøvik University College, Norway, 2009. [13] S.Steven Richard, "A New Approach to Securing Passwords Using a Probabilistic Neural Network Based on Biometric Keystroke Dynamics", Ph.D thesis, University of Newcastle upon Tyne, UK, The Department of Electrical and Electronic Engineering,2003. [14] G. Romain,M. El-Abed and R.,Christophe, "Keystroke Dynamics Authentication", International Symposium on Collaborative Technologies and Systems, France, 2009. [15] F.Monrose and A. D. Rubin, “Keystroke Dynamics as a Biometric for Authentication”, Future Gener Compute Syst Vol. 16, No.4, pp. 351– 359, 2000. [16] R. Kenneth, "User Authentication via Keystroke Dynamics: an Artificial Immune System Based Approach", in Proceedings of 5th International Conference on Information Technology, 2011. [17] I.Anagnostopoulos, "Classifying Web Pages Employing A Probabilistic Neural Network", in IEEE Proceedings Software, Vol. 151, No. 3, pp. 139-150, 2004. Volume 2, Issue 11, November 2013 Page 349