Last semester`s technical paper - Seidenberg School of Computer

advertisement
Keystroke Biometric System Test Taker Setup and Data Collection
Hassan Poorshatery, Geoffrey Garcia, Elizabeth Teracino, Xiaolu Zhao, Vinnie Monaco, John Stewart,
and Charles Tappert
{hp47222n, gg30810n, et75813p, zx84933n, jm51645n, ctappert}@pace.edu
Seidenberg School of CSIS, Pace University, White Plains, NY, 10606, USA
Keystroke dynamics are the patterns of rhythm
and timing created when a person types. They
include overall speed, variations of speed moving
between specific keys, common errors and the length
of time that keys are depressed. Data is recorded
when each key was pressed, for how long (duration)
by recording when the key was released, and the
latency between each key stroke. This rhythm is
believed to be unique to an individual and is captured
to develop a unique biometric template for the future
authentication of that same individual.
Abstract
Pace University’s Seidenberg School of Computer
Science and Information Systems (CSIS) has
developed over the past seven years a robust Pace
Keystroke Biometric System (PKBS) for both
identifying and authenticating users via their typing
rhythms and patterns which can be used to uniquely
differentiate between users.
The PKBS consists of three components: the
Keystroke Entry System (KES) that collects raw
keystroke data over the Internet, the Keystroke
Feature Extractor (KFE) that extracts feature vectors
from the raw data, and the Keystroke Pattern
Classifier (KPC) that is used in the authentication
process.
The work described in this paper focuses on
enhancements to the keystroke entry system to
support a real-world application to authenticate
students taking online tests.
Pace University has been researching this method
of
identification
and
authentication
via
experimentation and the implementation of the PKBS
software for more than seven years. With the
increase of enrolment in online education there is a
concern for evaluation security and academic
integrity.[4] Ensuring that students are who they say
they are during online examinations is extremely
important.
In a similar vein, this sort of
authentication could be used for the same reasons
when training and orientation examinations are
administered in a business setting.
1. Introduction
Keystroke biometrics is one of the least studied of
the biometrics; however this has been changing over
recent years due to the increase in online testing and
email monitoring in corporate settings. It is a
behavioural biometric that can be used for
identification and authentication purposes. User
Authentication is a process that determines whether
to confirm or deny a user’s claimed identity. [5]
Passwords are a common form of authentication,
however, user authentication can also be
accomplished with biometric systems via what you
are (i.e. fingerprints, iris) or how you behave
(handwriting, signature, typing rhythm).
An
advantage of using a keystroke behavioural biometric
system over alternatives is that the only piece of
hardware needed is a keyboard, making this an
inexpensive tool.
The Keystroke Biometric System process begins
with an initial training period where users register
and login to the system where they are asked to
answer a set of practice questions to gather initial
sampling data. Later, another set of tests is required
to test the authentication of the users from the initial
sampling. The KES was revised and updated to use
JavaScript instead of the original Java Applet
configuration in order to eliminate the user’s need to
have Java installed and simplify the data entry
process on the user’s end.
Feature measurements are extracted from the raw
data using the Keystroke Feature Extractor program
(KFE) and are then processed by the Keystroke
Pattern Classifier (KPC) which uses the k-Nearest
1
Neighbour classifier. [1] The collected raw keystroke
data samples (test) are processed and compared
against the archived enrolment samples (train) to
make an authentication decision. The test taker is
either matched (accepted) or not matched (rejected).
The system is first initiated by the instructor by
entering the course name and questions into a text file
called “Prompts.txt”. Next, the students login to the
KES website which is the test taking environment
and register as new users. During this registration
phase, the student will be asked to enter their first
name, last name, if they are right or left-handed, and
whether they are on a laptop or desktop. Registered
students can then login to the site to take the test,
answering the questions from the questions file. The
KES displays each question while JavaScript event
handlers monitor the text input area where keystrokes
entered by a student are captured. After completing
each question, the student is required to submit their
answer which the system saves as keystroke
dynamics information in text files on the hosting
server.
The purpose of this paper is to explain how to turn
the KES system into a real-world application that can
be used for the authentication of students taking an
online test, with an additional focus on the
improvement of the accuracy of the system.
2. Revised KES Interface
The Keystroke Entry System (KES) collects raw
keystroke data over the Internet. The Keystroke Entry
System (KES) has been customized for use by
students to take online tests.
Unlike the previous KES, the updated KES saves
new raw data text files that contain both the
keystroke codes and the answer to each question
from each student. After using a data convertor
program the keystroke code of these raw data files
are ready to be later analyzed and compared using the
Biometric Authentication Feature Extractor and
Feature (data) Classifier to identify the keystroke
patterns of each test taker (student), and finally, to
authenticate them. The actual answers as typed by the
students in the raw data file are used by the instructor
for grading purposes.
2.1. Changes to the KES
Figure 1 shows the components of the new KES
along with other components of the PKBS.
Figure 1: PKBS revised keystroke entry system
In the previous version of the KES, the four
experimental categories were copy task on a desktop,
copy task on a laptop, free-text entry on a desktop
and free-text entry on a laptop. At the completion of
registration or, upon returning to the site, a user is
redirected to the activity selection page. The six
pieces of information sent to, and required by, the
Java applet included data such as experiment style,
sequence number, keyboard style, and awareness.
Lastly, the user was required to use Internet Explorer.
[5] The new Keystroke Entry System is a PHP based
web application that uses JavaScript compatible with
the Mozilla Firefox browser (Figure 2). Unlike the
previous version there is no need to have java
installed on the user’s end.
2
Figure 2: KES home page
Figure 3: The data input screen
2.2. Data Format
As can be seen in the Figure 4, other than the file
name which displays the course/test name and
question number, the raw data file generated from a
user’s input within the Keystroke Entry System
includes three sections of information:
Regarding the application requirements and
recommendations from previous team research,
unnecessary information has been removed. For
example, students are not informed that their
keystrokes are captured during test taking for the
purpose of biometric authentication. Instead this is
simply acquired via background or stealth mode.
Test answers from the KES (Figure 3) are valid only
if at least 200 keystrokes (reduced from 300
keystrokes used in previous research) are collected.
Upon completion of the entry the user is thanked and
the data output is stored as a .txt file within the
application
structure
in
the
format
of
<First>_<Last>_<Course
Name>_PROMPT<Question #>. In addition to the raw data file, an
additional
file
of
the
format
<First>_<Last>_PROMPTS_COMPLETED
is
created which identifies all of the questions a user has
already answered. For returning users the file is
appended following the data entry to include which
additional questions have been answered.
1.
Header area:
Student or User’s name
2.
Data Entry area:
a. #: Sequence number for each
keystroke
b. Key: Character displayed onscreen: a, b, c…
c. Keycode: ASCII character code
corresponding to the Key field
d. Press: Time of key press (ms)
e. Release: Time of key release (ms)
f. Duration (ms): Difference in Press
and Release time for a particular
Sequence
g. Latency (ms): Difference in Press
time for a Sequence and the
Release time from the previous
Sequence
Answer area
a. Unformatted copy of the user’s
input
3.
Notes:

3
Key combinations, such as SHIFT + i,
recorded as individual sequences

3. Authentication Experimental Method
Holding down a key results in a new
sequence every 31 milliseconds following
an initial 500ms delay (depending on
computer configuration) and until the key is
released the release time records as a 0
As Figure 5 shows, in order to determine
authentication accuracy the KBS uses a manually
operated approach with multiple steps, which begins
with the capture of training and test data through the
Alpha and Beta Keystroke Entry Systems. Following
the collection of data, raw files are input into the
Keystroke Feature Extractor to simultaneously
determine the collective key features on the training
and testing data.
Figure 4: A raw data file of a student
Figure 5: PKBS Experimental Procedure
2.3. Alpha and Beta Version.
In an effort to gather user feedback and test the
system, both alpha (initial) and beta (real)
environments were configured. The difference in the
two is that the alpha version is intended to test the
new KES and PKBS as a general purpose
authentication system which uses randomly selected
questions from a list of generic questions, while the
beta version is intended to work as an online test
taker system as a real-world application of PKBS.
The beta version uses test questions presented in a
particular order determined by the instructor where
all questions need to be answered by the students.
For the alpha testing, four training data samples (free
text) were collected from a group of fourteen users.
The same group was asked for a second set of four
testing data samples a week later. Although the
questions are generic, it simulates taking an online
test and the data is marked as test data. For the beta
testing, a group of fourteen students from Lake Erie
College, under the supervision of Professor John
Stewart were to be asked to submit sample training
data and complete an actual test as their final exam
using the system during the same sitting. However,
due to the time restriction, we obtained both the
training and testing data in a single exam session,
later splitting half of the answered questions of each
student into training and testing.
A number of measurements or features are used to
characterize a user’s typing pattern. These features
are designed to describe an individual’s keystroke
dynamics over writing samples of at least 200
characters. The features characterize a user’s keypress duration times, transition times in going from
one key to the next, the percentages of usage of the
non-letter keys and mouse clicks, and the typing
speed . The feature extractor program was designed
and implemented to extract 239 features. The
resulting output is a feature vector file which is then
manually split in half into two files corresponding to
training and testing.
Finally, the split files are input into the
Authentication Classifier to determine authentication
accuracy. The BAS System uses the k-Nearest
Neighbour classifier. As part of the processing the
multi-class input data is dichotomized into two
classes. The test taker samples are decided to be
within-class or between-class by the classifier. Withclass samples are decided by the classifier to be “you
are authenticated”. Between-class samples are
decided by the classifier to be: “you are not
authenticated”. [3]
4
asks for the maximum amount of dichotomy data to
use which equates to the maximum number of inter
or intra class samples to create for experimentation,
as well as the lowest N choices (which is used to
optimize testing performance and stipulates the
maximum nearest neighbour test). [2] After the
dichotomy model is applied the data is saved and is
then ready to be processed using the BAS: Accuracy
Calculator (Figure 9). The calculator uses the output
file from the Biometric Authentication System and
applies nearest neighbour calculations to determine
the false acceptance rate and the false rejection rate,
as well as the overall performance of the test for each
of the nearest neighbour calculations.
4. Efficient Authentication Process..
The Keystroke Feature Extractor program was
modified by the previous team to offer different
processes based upon whether it is working on train
or test data. By means of a switch in the interface a
choice can be made to use the training process to
output a file containing the standardization x-min and
x-max values for each feature. For the testing
process the recorded x-min and x-max values from
the training process will be imported and used to
perform the standardization. [5]
Figure 6 shows that efficient experimental procedure
is as follows:
1.
2.
3.
Extract a feature file from the training raw
data and output a file of xmin/xmax values
Extract a feature file from the testing raw
data by reading in the xmin/xmax value file
from step 1
Run the authentication classifier on the
training and testing feature files
Figure 8: KFE interface
Figure 6: KBS efficient Authentication Process
Figure 8: BAS or KPC component interface
5. Test Results Analysis..
After generating a features vector file by using the
KFE (Figure 7), the authentication component of this
system will utilize the Biometric Authentication
System (BAS) which compares test and train data to
determine if they are a match (Figure 8). The system
5
Figure 9: Accuracy Calculator
Table 1: BAS Results for Alpha version (generic)
kNN
1
3
5
7
9
Avg
In the user authentication system, for a given
attempt by a user, one of the following four cases can
happen where u1 is a registered user and u2 is an
unregistered user unknown to the system:
FRR
36.90%
60.71%
57.14%
57.14%
57.14%
53.78%
FAR
7.69%
1.85%
1.30%
1.17%
0.89%
2.58%
Performance
90.71%
94.94%
95.65%
95.71%
96.04%
94.61%
Table 2 illustrates the result for the student’s realexam (Beta test) by using 56 samples for training and
56 samples for testing the system with the average of
435 keystrokes per sample.
True Positive: u1 claims to be u1 and is accepted
False Reject: u1 claims to be u1 but is rejected
Table 2: BAS Results for Beta version (real-exam)
False Accept: u2 claims to be u1 but is accepted
kNN
1
3
5
7
9
Avg
True Negative: u2 claims to be u1 and is rejected
False Acceptance Rate (FAR) and False Rejection
Rate (FRR) are the error rates used to evaluate the
performance of the biometric classifiers. [3]
As the raw keystroke data samples were gathered
via two different scenarios as previously explained,
four different experiments were administered in order
to test the performance of the new system.
FRR
16.67%
39.29%
44.05%
44.05%
42.86%
37.38%
FAR
16.83%
7.14%
7.42%
7.42%
6.87%
9.14%
Performance
83.18%
91.10%
90.58%
90.58%
91.17%
89.32%
By comparing Table 1 and 2 we can see the
performance for the real-world application is lower
than in the generic test.
Table 1 shows the result for the generic (Alpha)
test by using 56 samples of training data and 56
samples testing data with the average of 279
keystrokes per sample. As it can be seen from this
table, the best performance is 96.04% when the kNN
is 9 with the lowest FAR(.89%) and the highest
FRR(57.14%). The average performance is calculated
as 94.61%.
Table 3 shows the result for the merging both the
generic and the student’s real-exam (Alpha and Beta)
by using 112 samples for training and 112 samples
for testing the system with the average of 385
keystrokes per sample. The performance result is
higher than that of both the Alpha test (Table 1) and
Beta test (Table 2) at 97.41% when the kNN is 9.
The FAR is 0.94% and is lower than that of the Beta
test, and only slightly higher than that of the Alpha
test (a 0.04% difference) when the kNN is 9. The
trade-off is that the FRR for this case is higher than in
both the Alpha and Beta tests, by quite a bit. The
FRR is 61.90% as opposed to the 57.14% of the
Alpha test, and 42.86% of the Beta test.
6
Table 3: BAS Results for mixing Alpha & Beta raw
data
kNN
1
3
5
7
9
Avg
FRR
42.26%
67.26%
63.69%
66.07%
61.90%
60.24%
FAR
5.90%
1.57%
1.37%
1.16%
0.94%
2.19%
As a result, in order to increase accuracy of the
authenticating students taking online exam using
revised KES, it is necessary to increase the number of
extra samples merged into the data from prior
samples in the data bank in order to increase the
training of the system. Furthermore, working to
improve the feature extractor to include more features
and decrease error rates will help to increase the BAS
performance.
Performance
93.11%
96.65%
96.94%
97.09%
97.41%
96.24%
6. Future Works.
.....The current Pace Keystroke Biometric System
(PKBS) is the result of several evolutionary
prototyping projects each focusing on different
components of the system. Because of this approach,
the system lacks certain elements of cohesion,
automation and process which would be necessary
before being released in a production environment. It
is recommended that the following steps be taken to
further the current work:
Table 4 shows the result of the merging of both
exams (Alpha and Beta) by using 168 samples for
training and 56 samples for testing the system with
the same average of 385 keystrokes per sample. The
results show that this mix increased the performance
by almost an entire percentage point from the
previous mix (Table 3), and when the kNN is 9, the
performance is 98.38% which is the highest of the
four experiments. Furthermore, the FAR is the
lowest of the four experiments, at 0.33%. The FRR
is the highest as well however, at 71.43%, when the
kNN is 9.

Table 4: BAS Results for mixing Alpha & Beta with
more training samples
kNN
1
3
5
7
9
Avg
FRR
60.71%
78.57%
78.57%
71.43%
71.43%
72.14%
FAR
1.85%
0.46%
0.46%
0.60%
0.33%
0.74%


Performance
97.08%
98.12%
98.12%
98.12%
98.38%
97.96%



Both Tables 3 and 4 indicate that the more the
system is trained, the higher the performance
percentages.

In the last case illustrated by Table 4, we tested
the new KBS with around 86,240 test patterns
(merged the generic and student real-exam with more
training samples) and obtained the best FAR of
0.33% when the FRR was 71.43% and the
performance value was 98.38%. While the best
performance for Table 2 (pure training and testing
features from students only) is 91.17%.



7
Key hold times for the space characters
should not considered because the user
usually pause after pressing space key to
recollect of what has to be typed next.
Adapt the system to the changing typing
patterns of the users
Implement a spell check tool for users on
internet browser which will discourage them
from copying/pasting from other programs
which do offer spell checking
Utilize a database for all keystroke biometric
system data
Utilize web services and stored procedures
for all components of the keystroke
biometric system rather than executable files
Develop an administrator interface allowing
instructors the ability to create sample and
actual tests, as well as set timeframes for
when they can be taken
Following each course individual keystroke
biometric information should be merged into
a master user table which tracks an
individual’s academic career
Develop the Keystroke Entry System to be
cross-browser compatible
Develop the system to either immediately
process results or to have the processing run
as a nightly job
Develop a system (email and through the
admin interface) to alert an instructor of
suspicious results


Revising Keystroke Feature Extractor (KFE)
program to increase PKBS performance for
real-world application with small amount of
samples
8. Resources
[1] S. Janapala, S. Roy, J. John, Luca Columbu,
J. Carrozza, R. Zack, and C. Tappert,
“Refactoring a Keystroke Biometric System”.
paper b1, Proc. Student-Faculty Research Day,
Seidenberg School of CSIS, Pace University,
New York.
Integrate the Keystroke Entry System with
Blackboard for security and a seamless user
experience
7. Conclusion
[2] S. Bharati, R. Haseem, R. Khan, M.
Ritzmann, A. Wong, “Biometric Authentication
System using the Dichotomy Model” paper c3,
Proc. Student-Faculty Research Day, Seidenberg
School of CSIS, Pace University, New York,
May 2008.
Keystroke biometric is an inexpensive, yet
effective method of user identification and
authentication.
The Pace keystroke Biometric
System (PKBS), if developed further, could be
particularly ideal for student online testing when
embedded within a browser and customized per the
institution utilizing the system.
[3] C. C Tappert, S.-H. Cha, M. Villani, and R.
S. Zack, “A Keystroke Biometric System for
Long-Text Input”. Int. J. Info. Security and
Privacy (IJISP), Vol 4, No 1, 2010, pp 32-
While the main concern is currently surrounding
academic integrity during online testing, this sort of
authentication could be used for the same reasons
when training and orientation examinations are
administered in a business setting.
[4] R.S. Zack, C.C. Tappert and S.-H. Cha,
"Performance of a Long-Text-Input Keystroke
Biometric Authentication System Using an
Improved k-Nearest-Neighbor Classification
Method," Proc. IEEE 4th Int Conf Biometrics:
Theory, Apps, and Systems (BTAS 2010),
Washington, D.C., Sep 2010.
The results of our four different experiments
demonstrate that in order to increase authentication
accuracy of students taking online exam using the
revised Keystroke Entry System (KES), we need to
gather more training samples which can be taken
prior to the exams with the purpose of training the
system more prior to exams. Moreover, revising the
Keystroke Feature Extractor (KFE) program to
generate an increase in features and decrease in error
rate would further prepare the system for real-world
applications with a smaller amount of samples, thus
increasing the BAS performance rate.
[5] A. C. Caicedo, K. Chan, D. A. Germosen, S.
Indukuri, M. N. Malik, D. Tulasi, M. C. Wagner,
R. S. Zack and C. C. Tappert,” Keystroke
Biometric: Data/Feature Experiments” paper b5,
Proc. Student-Faculty Research Day, Seidenberg
School of CSIS, Pace University, New York,
May 2010.
8
Download