International Journal of Application or Innovation in Engineering & Management... Web Site: www.ijaiem.org Email: , Volume 2, Issue 11, November 2013

advertisement
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 11, November 2013
ISSN 2319 - 4847
Probabilistic Neural Network for User
Authentication Based on Keystroke Dynamics
Sarab M. Hameed1, Mais M. Hobi2 and Sumaya Saad3
1,2,3
Computer Science Department, Baghdad University, Baghdad, Iraq
Abstract
Computer systems and networks are increasingly used for many types of applications; as a result the security threats to computers
and networks have also increased significantly. Traditionally, password user authentication is widely used to authenticate
legitimate user, but this method has many loopholes such as password sharing, brute force attack, dictionary attack and more.
The aim of this paper is to improve the password authentication method using Probabilistic Neural Networks (PNNs) with three
types of distance include Euclidean Distance, Manhattan Distance and Euclidean Squared Distance and four features of
keystroke dynamics including Dwell Time (DT), Flight Time (FT), mixture of (DT) and (FT), and finally Up-Up Time (UUT). The
results illustrate that Euclidean Squared Distance with (UUT) feature provide low error rate and high accuracy compared with
the other two types of distances used.
Keywords: Biometrics, Keystroke Dynamics, Probabilistic Neural Network, User Authentication.
1. INTRODUCTION
Internet applications use an authentication scheme to make sure that only genuine individual can login to the application
[1]-[2]. User authentication is the process of validating claimed identity for the purpose of performing trusted
communications between parties for computing application [3].Biometrics offers an automated methods for authentication
and identification based on physiological or behavioral characteristics [4]-[5].
The keystroke dynamics authentication is based on the idea that each user has unique keystroke latency pattern which
is different from others [6]. There are two types of keystroke dynamics: first one is the static keystroke dynamics in which
the typed data is fixed and the typed time information is also fixed during login time. While the second one is continuous
keystroke dynamics in which individuals are authenticated independently of what they are typing on the keyboard and the
typing characteristics are analyzed during complete session [7].
This paper is organized as follows. Section two presents related work. Section 3 illustrates the features of keystroke
dynamics. Section 4 presents the proposed approach; section 5 describes the proposed approach for user authentication,
while the evaluation criteria explained in section 6. Finally the experiment results, results analysis and conclusions are
given in sections 7, 8, and 9 respectively.
2. RELATED WORK
There are several previous works that may have a relation in one way or another to the present work.
-In[8], a comparison between ADALINE(Adaptive Linear Element) based on the single perceptron model and the
BPNN(Back Propagation Neural Networks) model, using both the FT and digraph latency time, concluded that the BPNN
surpass the ADALINE which was not capable of classifying patterns.
-In [9], a statistical method was used to classify users to legitimate users or impostors. The extracted FT feature was used
and 63 users participated in their experiment.
-In [10], a statistical-based comprehensive study was carried out and presented the development of a keystroke dynamicsbased user authentication system using the neural network with FT feature. The deduction was that neural network-based
methods have better results as compared with statistical methods in keystroke patterns classification.
-In [11], user’s typing biometrics was measured using a fuzzy logic. This experiment involved 29 users who provided
their user ID (Identification) and password of length eight or more. Common DT were used and the typing difficulty
between two successive keystrokes is calculated. One main disadvantage of using typing difficulty as keystroke feature
was to set the categories of difficulty, as there are wide ranges of possibilities to define the typing difficulty.
3. KEYSTROKE DYNAMICS FEATURES
The main purpose of the feature extraction phase is extracting vital features from the timestamp collected from raw
keystroke data for template generation. The extracted features are certainly the critical means with regards to keystroke
dynamics based biometrics and directly influence the performance of the classifier [11]. Extracting the right features
from a dataset can reduce the computational complexity of the problem [14].Each sample in the dataset is represented by
a sequence of timing information expressing the exact time at which keys have been pressed and released. There are
many common types of features which can be extracted from a human keystroke as follows [11]-[12]:1- Dwell time: It is the time to measure how long a key is pressed (Down) until it is released (Up).
Volume 2, Issue 11, November 2013
Page 342
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 11, November 2013
ISSN 2319 - 4847
2- Flight time: It is the Interval between a key release (Up) and next key press (Down) time.
3- Down-Down: It is the Interval between two successive key presses (Down).
4- Up-Up: It is the Interval between two successive key releases.
5- Tri-graph: Is the elapsed time between the first key press (Down) and the third key press (Down).
6- Placement of the fingers: a camera is required in this type.
7- Pressure of keystroke: In this case a special type of pressure which has a sensitive keyboard needs to be used.
4. PROBABILISTIC NEURAL NETWORKS
The PNN consists of input layer, pattern layer, summation layer, and output layer. The number of nodes in input layer
depends on length of timing information vector for specific user. The pattern layer is designed to contain one neuron
(node) for each training sample available and the neurons are split into the two classes. Each neuron in the pattern layer
computes a distance measure between the presented input vector and the training example represented by that pattern
neuron. The summation layer contains one neuron for each class, while the output layer contains one neuron. PNN
algorithm is started with read timing information vector (X) and feed it to each Gaussian function in each class, then for
each group of hidden nodes, compute all Gaussian functional values at the hidden nodes as illustrated in equation(2)[17].
p
i ( x )
D

 
2


 exp
2



(2)
Where
pi : represents the output of a pattern node.
x : is the timing information vector to be assigned into class ci .
 : smoothing factor.
D : represents the distance between the timing information vector x and the sample vector (reference vector) y is
computed in this paper using three types of distance measures as illustrated in the following equations [15]-[16]:
1-Euclidean Distance: is one of the most popular distance metric between two vectors x and y and is computed as in
equation (3).
D ( x , y ) 

x

y
2
(3)
2-Euclidean Squared Distance: uses the same equation as the Euclidean distance metric, but does not take the square
root.
3-Manhattan Distance: the Manhattan (or city block) distance between vector x and vector y is calculated as in equation
(4).
D (x, y) 
y  x

(4)
After these steps, for each group of hidden nodes, feed all its Gaussian functional values to the single output node for that
group as illustrated in equation(5).
y
i
1
( x) 
n
n

j
j i1
p
i
( x)
(5)
Where
yi
nj
: represents the output of summation node i for
classi
: represents the number of samples in pattern layer of
classi
pi
: represents the output of pattern node i
Finally, find maximum value of all summed functional values at the output nodes comparing the values of Y1(x) and
Y2(x).
If Y1(x) > Y2(x), then is assigned to class1;otherwise class2.
Volume 2, Issue 11, November 2013
Page 343
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 11, November 2013
ISSN 2319 - 4847
5. PROPOSED APPROACH FOR USER AUTHENTICATION
The aim of this section is to apply PNN for keystroke dynamics. This section, also, clarifies the attributes that can be
extracted from the users based on their typing styles that can be employed to maximize subtle differences between the
typing styles of users.
5.1Keystroke Dataset Collection Phase
An essential part of any keystroke dynamics system is the acquisition of users keystroke data from the keyboard [13].
Therefore the keyboard property is set to enable the proposed system to distinguish between authentic users and impostors
in an accurate way. The dataset acquired is a collection of timing of keystroke for specific passwords over a period of
three months. The users are randomly selected from the staff of Baghdad University/college of science to participate
building the required datasets. Two datasets are created. The first one is named Keystroke-1and the second one is
Keystrok-2.
In Keystroke-1, 17 users are asked to type the password "computer" twenty five times. Five times (i.e. five samples) per
week's session, i.e. five weeks are required to collect the dataset for each user. The user types the password in different
sessions because there is a chance that when the user types continuously the same password again and again, the typing
speed may be increased. Further, to take the probability of all user variations and circumstances.
However, the password “computer” is considered weak because of the relative ease with which a third party can guess
them or find them via dictionary attacks. Hence, another dataset, Keystroke-2, is built up in a similar way to Keystroke-1,
but 16 users were participated (thirteen of them are similar to those who participated keystroke-1). Here the users were
asked to type password “comp.84-rl” which satisfies the strong password selection criteria, i.e., at 10 characters in length
combining symbols and numbers.
5.2 Feature Extraction phase
In this paper, DT, FT, and UUT features are extracted from the collected dataset. Figure (1) depicts the three major
features that have been extracted when the users type the "computer" password. The same representation can be repeated
with the second password”comp.84-rl”.
Figure 1 Features Representation of "computer" Password
The length of the timing information vector is different and depends on the length of the password, for example, a
password “computer” which contains eight characters will result in eight DT, seven FT and seven UUT. Generally, a
password with n character will yield n number of DT and n - 1 number for FT and UUT.
5.3 Preprocessing
To enable the proposed system to distinguish between authentic user and imposter with minimum error rate, a
normalization process is performed as shown in equation (1)

F 
F  MinF
MaxF  MinF
(1)
where
F: is the feature value,
MinF: is the minimum value that the feature F can get, and
MaxF: is the maximum value that the feature F can get.
5.4 Training Phase
This section illustrates how PNN was used in proposed system..Table (1) shows the number of nodes in each layer when
the features of "computer" password are extracted. The first row represents the information of PNN structure when DT
feature is used. The number of input layer nodes represents the length of information timing vector according to specific
feature. The number of pattern layer nodes represents the total number of training samples (users).
Volume 2, Issue 11, November 2013
Page 344
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 11, November 2013
ISSN 2319 - 4847
Table 1 : PNN Information Structures for "computer" Password
Feature(s)
Input layer
Pattern layer
Summation layer Output layer
Nodes
Nodes
Nodes
Nodes
DT
8
17
2
1
FT
7
17
2
1
DT+FT
15
17
2
1
UUT
7
17
2
1
In this paper, there are two classes, the first class for authorized users and the second one for impostors. For example, in
Keystroke-1 there are 17 users, the first 6 users are authorized and the remaining 11 users are impostors when the
"computer" password is used. On the other hand, in Keystroke-2 there are 16 users the first 6 users are authorized and the
remaining 10 users as impostors.
The main aim of training phase of PNN is to find a proper value of smoothing parameter  . In this paper, an algorithm
is proposed to determine the proper range for  as clarified in algorithm (1). After the proper range for  is obtained,
the training samples in pattern layer nodes are presented as input information vector in input layer. Then change 
value within proper range and apply PNNalgorithm continuously until reaching to least error rates in classification. This
value of  is considered the best value for good classification .
Algorithm (1): Smoothing Parameter Range Determination
 
Input: Set of training timing information vectors p i , j .
 
Output: Smoothing parameter  .
Step1: [Find Mean Vector]
Compute the mean (centeroid) vector for each class
c k , 1<= k<=2.
k 
1
Nk
p
i, j
.
Where
N : is the total number of patterns with specific class
pi, j
j
i
: is the feature number of pattern number
Step2: [Find Standard Deviation Vector]
Compute the standard deviation vector for each class, k.
std k 
1
Nk
2
 
i
 pi, j  .
where
 i : is the ith value of mean vector
Step3: [Find Range]
Find the minimum and maximum standard deviation value for each class to obtain the proper range of
  value.
5.5 Authentication Phase
Authentication process consists of verifying if the input of the user corresponds to the claimed identity. The way of
capturing these inputs greatly depends on the kind of the keystroke dynamics used (i.e., in this paper for example the user
must type the password). The features are extracted from the raw biometric sample and are compared to the model of the
claimed user. By this way, two authentication factors are used namely knowledge based, which is the password of the user
and biometric based, which is the way of typing the password. As mentioned previously in PNN training section the
optimum value of smoothing parameter  is obtained in order to use this value in PNN authentication phase. The user is
asked to type the password. When the user types the correct password, the PNN authentication begins to check whether
the user is authentic or not. The timing information vector is extracted. Then the timing vector is presented to the input
layer nodes. At pattern layer, the similarity score between a reference template and a test data template is computed. The
Volume 2, Issue 11, November 2013
Page 345
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 11, November 2013
ISSN 2319 - 4847
output score of pattern layer node has the range of [0 1]. Then this output score is put forward into summation layer.
Finally, the decision is made at output layer to classify the user either as authentic or imposter.
6. EVALUATION CRITERIA
To evaluate the predictive performance of the proposed biometric authentication system four measures are calculated.
These measures are called False Reject Rate (FRR), False Accept Rate (FAR), Mean Error Rate (MER) and Accuracy.
The formulas for calculating each of these measures are given as in equations (6), (7), (8) and (9) respectively [12].
FRR: is defined as the rate at which users are rejected when they could be authenticated.
FRR 
Number of rejected ligitimate users
Total number of ligitimate attempts
(6)
FAR: is defined as the rate at which users are accepted when they should be rejected.
FAR 
Number of accepted impostor attempts
Total number of impostor attempts
(7)
MER: is the mean rate of FAR and FRR.
MER

FAR
 FRR
2
(8)
Accuracy: is defined as the proportion of true results in the population.
Accuracy

Number of correct attempts
Total number of attempts
(9)
7. EXPERIMENTAL RESULTS
Two experiments are conducted independently. In each experiment the two passwords “computer” and “comp.4-rl” are
used by PNN. In the first experiment was tested on the same sample in the training dataset i.e. Keystrok-1 and Keystrok2. Each sample contains mean values of keystroke timing vectors. The results of this experiment were obtained with error
rates equal to 0% and Accuracy 100% of two passwords "computer" and "comp.84-rl" with three types of distances, as
shown in tables (2-3).
The second experiment deals with testing the proposed approach on line. This experiment includes computing the
selected feature(s) for each key he/she typed. Then the preprocessing on the computed vector time is applied. Moreover,
this experiment involves the testing when the proposed approach uses the first trial of password typing, and when it uses
three trials of password typing. The results of this experiment are shown in table3( 4-5).
“computer”
Manhattan
Euclidian
Squared
Euclidian
Distance
Password
Table 2: Experiment1 Results of “computer” Password
Metrics
DT
FT
DT+F UUT
T
FRR%
0
0
0
0
FAR%
MER%
Accuracy%
0
0
100
0
0
100
0
0
100
0
0
100
FRR%
0
0
0
0
FAR%
MER%
Accuracy%
0
0
100
0
0
100
0
0
100
0
0
100
FRR%
0
0
0
0
FAR%
MER%
Accuracy%
0
0
100
0
0
100
0
0
100
0
0
100
Table 3: Experiment1 Results of “comp.84-rl” Password
Volume 2, Issue 11, November 2013
Page 346
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 11, November 2013
ISSN 2319 - 4847
DT
FT
DT
+FT
UU
T
FRR%
FAR%
MER%
Accuracy
%
FRR%
FAR%
MER%
Accuracy
%
FRR%
FAR%
MER%
Accuracy
%
0
0
0
100
0
0
0
100
0
0
0
100
0
0
0
100
0
0
0
100
0
0
0
100
0
0
0
100
0
0
0
100
0
0
0
100
0
0
0
100
0
0
0
100
0
0
0
100
“comp.84-rl”
Manhattan
Euclidian
Squared
Euclidian
Distance
Password
Metrics
8. RESULTS ANALYSIS
The PNN is used as keystroke dynamics authentication. The results of using PNN with different types of distances show
the ability to distinguish authentic users from impostors. The results of experiment1 reflect a good training level in order
to obtain high performance of proposed approach..The results of experiment2 give the indication to that the DT feature is
the worst keystroke dynamic feature. On other hand experiment2 results indicate that UUT producethe best or equal
results as compared with other features that are used in previous works DT, FT, and combination of DT and FT.
Finally, the best results are obtained with UUT out performs others because UUT implicitly contains the two other
features DT, and FT; that leads to build a new feature from the previous two features making the last feature having more
capability to discriminate the authentic users from the impostors. Furthermore, this best result is obtained with fewer
network net nodes when compared with combination of DT and FT features.The FAR%, FRR%, MER% and Accuracy of
proposed approach evidence that it is possible to improve the password security mechanism considering not just the
combination of DT and FT but also UUT feature.
9. CONCLUSIONS
From the results, one can conclude, in general, the following:
1-The DT, FT, and UUT features are candidate for distinguishing between authentic users and imposter one and may be
exploited to reinforce password security.
2-The keystroke dynamics features namely DT, FT, combination of DT and FT and UUT are extracted. However UUT
feature satisfies the best results among the others.
3-The accuracy of the presented work with PNN is close to meet acceptable error levels that would be required for a
system with some degree of security.
4-When PNN with three types of distance metric (Euclidean distance, Euclidean squared distance, and Manhattan
distance) is applied the results show that the Euclidean squared distance produced the best results.
Volume 2, Issue 11, November 2013
Trail no.
Squared
Euclidean Distance
Euclide
an
“computer"
Password
Table 4 Experiment2 Results of “computer” Password
1
Metrics
DT
FT
DT
+F
T
UU
T
FRR%
FAR%
MER%
Accuracy
%
FRR%
FAR%
MER%
50
9
29
16
0
8
16
0
8
16
0
8
76
94
94
94
16
27
21
0
0
0
0
0
0
0
0
0
Page 347
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Squared
Euclidean
Euclidean
Manhattan
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 11, November 2013
ISSN 2319 - 4847
Manhattan
3
Accuracy
%
FRR%
FAR%
MER%
Accuracy
%
FRR%
FAR%
MER%
Accuracy
%
FRR%
FAR%
MER%
Accuracy
%
FRR%
FAR%
MER%
Accuracy
%
76
100
100
100
50
9
29
33
0
16
33
0
16
16
0
8
76
88
88
93
61
9
35
38
0
19
27
0
13
27
0
13
72
86
90
90
33
24
28
27
3
15
27
0
13
16
0
8
72
88
90
94
61
9
35
50
0
25
50
0
25
33
0
16
72
82
82
88
Trail no.
Password
Distance
Table 5 Experiment2 Results of “comp.84-rl” Password
Metrics
DT FT
DT
UU
+F
T
T
“comp.84-rl"
Manhatt Squared
an
Euclidean
Euclidean
Manhattan
Squared
Euclidean
Euclidean
1
Volume 2, Issue 11, November 2013
3
FRR%
FAR%
MER%
Accuracy
%
FRR%
FAR%
MER%
Accuracy
%
FRR%
FAR%
MER%
Accuracy
%
FRR%
FAR%
MER%
Accuracy
%
FRR%
FAR%
MER%
Accuracy
%
FRR%
FAR%
MER%
50
30
40
62
0
10
5
93
16
0
8
93
0
0
0
100
50
30
40
62
0
10
5
93
16
0
8
93
0
0
0
100
33
20
26
75
0
20
10
87
16
10
13
87
0
0
0
100
55
30
42
60
5
10
7
91
5
10
7
91
11
3
7
93
66
23
45
60
11
10
10
89
5
13
9
89
11
3
7
93
55
26
41
11
13
12
11
10
10
16
3
10
Page 348
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 11, November 2013
ISSN 2319 - 4847
Accuracy
%
62
87
89
91
REFERENCES
[1] U. Dieckman N and R.W. Frischholz “BioID: A Multimodal Biometric Identification Systems” IEEE Computer,
Vol.33, No. 2,pp.64-68, 2000 .
[2] F. Monrose and A. D., Rubin “Keystroke Dynamics as a Biometric for Authentication” Future Generation Computing
Systems (FGCS), Vol.12, No 12, pp.351-359, 2000.
[3] L. O’Gorman Comparing Passwords, Tokens, and Biometrics for User Authentication, Proccedings of the
IEEE,Vol.91,No.12, pp.2019-2040, 2003.
[4] Y. Chen and A. jain, “Beyond minutiae: A fingerprint individuality model with pattern, ridge and pore features,” in
ICB09,2009.
[5] H. Mendez, C.Martin, J. Kittler,Y. Plasencia, and E. Garcia Reyes, “Face recognition with lwirimagery using local
binary patterns,” in Proceedings of the International Conference on Advances in Biometrics, 2009.
[6] L. C. F Araujo,H. R. LuizSucupirajr., Miguel G. Lizarrage, Lee L. Ling, and Joao B. T. Yabu-Uti, “User
Authentication Through Typing Biometrics Features”, IEEE Transactions on Signal Processing, Vol. 53,No.2,
pp.851-855, 2005.
[7] D. Davis and W. Price, “Security for Computer Networks”, John Wiley & Sons, Inc., 1989.
[8] N. Abdullah, A.M. Ahmad, “User Authentication via Neural Networks”, in Proceedings of the 9th International
Conference on Artificial Intelligence: Methodology, Systems, and Applications, pp. 310-320, 2000.
[9] D. C. D. Souza, “Typing Dynamics Biometric Authentication”, Bachelor engineering thesis, Faculty of Engineering
and Physical Sciences, University of Queensland, Australia, 2002.
[10] L. C. Change., L. W. Kin, and L. C. Peng, "Keystroke Patterns Classification Using the ARTMAP-FD Neural
Network", In proceeding of 3rd International Conference on Intelligent Information Hiding and Multimedia Signal
Processing, IIHMSP, Vol. 2, pp. 61-64, 2007.
[11] P.S. Tee.,T.S. Ong. andA. B. J. Teoh, “A Multilayer Layer Fusion Approach on Keystroke Dynamics”, Springer,
Vol. 14, pp. 23-36, 2011.
[12] H. Barghouthi, “Keystroke Dynamics How Typing Characteristics Differ from One Application to Another”, Msc in
Information Security, Gjøvik University College, Norway, 2009.
[13] S.Steven Richard, "A New Approach to Securing Passwords Using a Probabilistic Neural Network Based on
Biometric Keystroke Dynamics", Ph.D thesis, University of Newcastle upon Tyne, UK, The Department of Electrical
and Electronic Engineering,2003.
[14] G. Romain,M. El-Abed and R.,Christophe, "Keystroke Dynamics Authentication", International Symposium on
Collaborative Technologies and Systems, France, 2009.
[15] F.Monrose and A. D. Rubin, “Keystroke Dynamics as a Biometric for Authentication”, Future Gener Compute Syst
Vol. 16, No.4, pp. 351– 359, 2000.
[16] R. Kenneth, "User Authentication via Keystroke Dynamics: an Artificial Immune System Based Approach", in
Proceedings of 5th International Conference on Information Technology, 2011.
[17] I.Anagnostopoulos, "Classifying Web Pages Employing A Probabilistic Neural Network", in IEEE Proceedings
Software, Vol. 151, No. 3, pp. 139-150, 2004.
Volume 2, Issue 11, November 2013
Page 349
Download