Keystroke Biometric Identification and Authentication on Long

advertisement
Keystroke Biometric Identification and
Authentication on Long-Text Input
Summary of eight years of research in this area
Charles Tappert
Seidenberg School of CSIS, Pace University
DPS+PhD Biometric Dissertations
•
Completed
– Keystroke Biometric (long text input)
•
•
•
•
•
Identification: feasibility study – Mary Curtin 2006
Identification: desk/laptop + copy/free text – Mary Villani 2006
Identification: touch-type feature/fallback hierarchy – Mark Ritzmann 2007
Authentication: kNN ROC curve derivation methods – Robert Zack 2010
Authentication: statistical fallback for missing/incomplete info – Steve Kim 2013
– Keystroke Biometric (short and long text input)
• Authentication: text/spreadsheet/browser/keypad input – Ned Bakelman 2014
– Stylometry + Keystroke Biometric (long text input)
• Authentication of online test-takers – John Stewart 2012
•
In Progress
– Keystroke Biometric (short and long text input)
• Authentication of Impaired Users – Gonzalo Perez
• Authentication on Smartphones of Short Text Input – Mike Coakley
• Authentication System Improvements – Vinnie Monaco
– Stylometry
• Authentication of Facebook Postings – Jenny Li
– Speaker Verification
• Common passphrase approach: “My name is” – Jonathan Leet
• Qualitative study replacing username/password with biometrics – James Sicuranza?, Hugh Eng?
– Mouse Movement (Phil Dressner?)
– Authentication Biometrics on Handhelds (Leigh Anne Clevenger?, Alecia Copeland?, Mantie
Reid?, Rich Barilla?, Stephanie Haughton?)
Keystroke Biometric Studies
References
1.
L. Jain, J.V. Monaco, M.J. Caokley, and C.C. Tappert, Passcode Keystroke Biometric Performance on Smartphone Touchscreens is Superior to
that on Hardware Keyboards, Int. J. Research in Computer Apps. & Info. Tech., IASTER, Vol.2, Issue 4, July-August, 2014, pp 29-33. Preview
of Coakley’s dissertation.
2. S. Kim, S. Cha, J.V. Monaco, and C.C. Tappert, A Correlation Method for Handling Infrequent Data in Keystroke Biometric Systems, Proc. 2nd
Int. Workshop Biometrics & Forensics (IWBF 2014)., Malta, Mar 2014. Summary of Kim’s dissertation.
3. J.V. Monaco, J.C. Stewart, S. Cha, and C.C. Tappert, Behavioral Biometric Verification of Student Identity in Online Course Assessment and
Authentication of Authors in Literary Works, Proc. IEEE 6th Int. Conf. Biometrics, Wash. D.C., Sep 2013. Preview of Monaco’s dissertation.
4. N. Bakelman, J.V. Monaco, S. Cha, and C.C. Tappert, Keystroke Biometric Studies on Password and Numeric Keypad Input, Proc. 2013
European Intelligence and Security Informatics Conf., Sweden, Aug 2013. Summary of Bakelman’s dissertation.
5. J.V. Monaco, N. Bakelman, S. Cha, and C.C. Tappert, Recent Advances in the Development of a Long-Text-Input Keystroke Biometric
Authentication System for Arbitrary Text Input, Proc. European Intell. and Sec. Inform. Conf., Sweden, Aug 2013.
6. J.V. Monaco, N. Bakelman, S. Cha, and C.C. Tappert, Developing a Keystroke Biometric System for Continual Authentication of Computer
Users, Proc. European Intell. and Sec. Inform. Conf., Denmark, Aug 2012, pp 210-216.
7. J.C. Stewart, J.V. Monaco, S. Cha, and C.C. Tappert, "An Investigation of Keystroke and Stylometry Traits," Proc. Int. Joint Conf. Biometrics
(IJCB 2011), Wash. D.C., Oct 2011. Summary of Stewart’s dissertation.
8. C.C. Tappert, S. Cha, M. Villani, and R.S. Zack, "A Keystroke Biometric System for Long-Text Input," Int. J. Info. Security and Privacy (IJISP), Vol
4, No 1, 2010, pp 32-60. Best overall summary of keystroke system.
9. R.S. Zack, C.C. Tappert and S.-H. Cha, "Performance of a Long-Text-Input Keystroke Biometric Authentication System Using an Improved kNearest-Neighbor Classification Method," Proc. IEEE 4th Int Conf Biometrics: Theory, Apps, and Systems (BTAS 2010), Washington, D.C., Sep
2010. Summary of Zack’s dissertation.
10. S. Cha, Y. An, and C.C. Tappert, "ROC Curves for Multivariate Biometric Matching Models," Proc. Int. Conf. Artificial Intelligence and Pattern
Recognition, Orlando, Florida, July 2010.
11. C.C. Tappert, M. Villani, and S. Cha, "Keystroke Biometric Identification and Authentication on Long-Text Input," pp 342-367, Chapter 16 in
Behavioral Biometrics for Human Identification: Intelligent Applications, Edited by Liang Wang and Xin Geng, Medical Information Science
Reference, 2010.
12. M. Villani, C.C. Tappert, G. Ngo, J. Simone, H. St. Fort, and S. Cha, "Keystroke Biometric Recognition Studies on Long-Text Input under Ideal
and Application-Oriented Conditions," Proc. CVPR 2006 Workshop on Biometrics, New York, NY, June 2006. Summary of Villani’s
dissertation.
Keystroke Biometric Studies
Introduction
Build a Case for Usefulness of Study
•
•
•
•
•
•
•
Validate importance of study – applications
Define keystroke biometric
Appeal of keystroke over other biometrics
Previous work on the keystroke biometric
No direct study comparisons on same data
Feature measurements
Make case for using: data over the internet, long
text input, free (arbitrary) text input
• Extends previous work by authors
• Summary of scope and methodology
• Summary of paper organization
Keystroke Biometric Studies
Introduction
Validate importance of study – applications
• Internet authentication application
– Authenticate (verify) student test-takers
• Internet identification application
– Identify perpetrators of inappropriate email
• Internet security for other applications
– Important as more businesses move toward
e-commerce
Keystroke Biometric Studies
Introduction
Define Keystroke Biometric
• The keystroke biometric is one of the lessstudied behavioral biometrics
• Based on the idea that typing patterns are
unique to individuals and difficult to
duplicate
Keystroke Biometric Studies
Introduction
Appeal of Keystroke Biometric
• Not intrusive – data captured as users type
– Users type frequently for business/pleasure
• Inexpensive – keyboards are common
– No special equipment necessary
• Can continue to check ID with keystrokes after
initial authentication
– As users continue to type
Keystroke Biometric Studies
Introduction
Previous Work on Keystroke Biometric
• One early study goes back to typewriter input
• Identification versus authentication
– Most studies were on authentication
• Two commercial products on hardening passwords
– Few on identification (more difficult problem)
• Short versus long text input
– Most studies used short input – passwords, names
– Few used long text input –copy or free text
• Other keystroke problems studies
– One study detected fatigue, stress, etc.
– Another detected ID change via monitoring
Keystroke Biometric Studies
Introduction
No Direct Study Comparisons on Same Data
• No comparisons on a standard data set
– (desirable, available for many biometric and
pattern recognition problems)
• Rather, researchers collect their own data
• Nevertheless, literature optimistic of
keystroke biometric potential for security
Keystroke Biometric Studies
Introduction
Feature Measurements
• Features derived from raw data
– Key press times and key release times
– Each keystroke provides small amount of data
• Data varies from different keyboards, different
conditions, and different entered texts
• Using long text input allows
– Use of good (statistical) feature measurements
– Generalization over keyboards, conditions, etc.
Keystroke Biometric Studies
Introduction
Make Case for Using
• Data over the internet
– Required by applications
• Long text input
– More and better features
– Higher accuracy
• Free text input
– Required by applications
– Predefined copy texts unacceptable
Keystroke Biometric Studies
Introduction
Extends Previous Work by Authors
• Previous keystroke identification study
– Ideal conditions
• Fixed text and
• Same keyboard for enrollment and testing
– Less ideal conditions
• Free text input
• Different keyboards for enrollment and testing
Keystroke Biometric Studies
Introduction
Summary of Scope and Methodology
• Determine distinctiveness of keystroke
patterns
• Two application types
– Identification (1-of-n problem)
– Authentication (yes/no problem)
• Two indep. variables (4 data quadrants)
– Keyboard type – desktop versus laptop
– Entry mode – copy versus free text
Keystroke Biometric Studies
Keystroke Biometric System Components
•
•
•
•
Raw keystroke data capture
Feature extraction
Classification for identification
Classification for authentication
Keystroke Biometric Studies
Keystroke Biometric System
Raw Keystroke Data Capture
Keystroke Biometric Studies
Keystroke Biometric System
Raw Keystroke Data Capture
Keystroke Biometric Studies
Keystroke Biometric System
Feature Extraction
• Mostly statistical features
– Averages and standard deviations
• Key press times
• Transition times between keystroke pairs
– Individual keys and groups of keys – hierarchy
• Percentage features
– Percentage use of non-letter keys
– Percentage use of mouse clicks
• Input rates – average time/keystroke
Keystroke Biometric Studies
Keystroke Biometric System
Feature Extraction
t-key
h-key
a) Non-overlapping
duration
t1
t2
t-key
h-key
b) Overlapping
time
A two-key sequence (th) showing the two transition measures
Keystroke Biometric Studies
Keystroke Biometric System
Feature Extraction
All Keys
All
Letters
Non
Letters
Right
Letters
Space
Left
Letters
Vowels
e a o
i
Freq
Cons
Next
Freq Cons
Other
Shift
Least
Freq Cons
Punctuation
u
m w y b g
Other
.
,
Numbers
‘ Other
t n s r h
l
d c p f
Hierarchy tree for the 39 duration categories
Keystroke Biometric Studies
Keystroke Biometric System
Feature Extraction
Any-key/Any-key
Letter/Letter
Right/Right
Cons/
Cons
th
Vowel/
Cons
an
Cons/
Vowel
es
on
at
he
en
or
Left/Right
Vowel/
Vowel
in
er
Punct/
Space
Right/Left
nd
st
Letter/
Non-letter
Non-letter/
Letter
ti
Left/Left
ea
Space/
Shift
Shift/
Letter
Letter/
Space
Letter/
Punct
Non-letter/
Non-letter
Space/
Letter
Double
Letters
re
Hierarchy tree for the 35 transition categories
Keystroke Biometric Studies
Keystroke Biometric System
Feature Extraction
• Fallback procedure for few/missing samples
• When the number of samples is less than a fallback
threshold, take the weighted average of the key’s mean
and the fallback mean
 ' (i) 
n(i )   (i )  k fallback weight   ( fallback )
n(i )  k fallback weight
Keystroke Biometric Studies
Keystroke Biometric System
Feature Extraction
• Two preprocessing steps
– Outlier removal
• Remove duration and transition times > threshold
– Feature standardization
• Convert features into the range 0-1
x  xmin
x' 
xmax  xmin
Keystroke Biometric Studies
Keystroke Biometric System
Classification for Identification
• Nearest neighbor using Euclidean distance
• Compare a test sample against the training
samples, and the author of the nearest training
sample is identified as the author of the test
sample
Keystroke Biometric Studies
Keystroke Biometric System
Classification for Authentication
Transformation
to Dichotomymodel
• Cha’s vector-distance
(dichotomy)
f2
f2
d31
d32
d33
(d1,2 ,d1,3)
d21
(d1,3 ,d2,1)
d22
d1,2
d23
d1,3
d1,1
(d1,2 ,d1,3)
(d1,3 ,d2,1)
f1
f1
Feature space Keystroke Biometric Studies
Distance space
Experimental and Data Collection Design
• Two independent variables
– Keyboard type
• Desktop – all Dell
• Laptop – 90% Dell + IBM, Compaq, Apple, HP, Toshiba
– Input mode
• Copy task – predefined text
• Free text input – e.g., arbitrary email
Keystroke Biometric Studies
Experimental and Data Collection Design
Keystroke Biometric Studies
Subjects and Data Collection
• Subjects provided samples in at least two quadrants
• Five samples per quadrant per subject
• Summary of subject demographics
Age
Female
Male
Total
Under 20
15
19
34
20-29
12
23
35
30-39
5
10
15
40-49
7
11
18
50+
11
5
16
All
50
68
118
Keystroke Biometric Studies
Experimental Results
•
•
•
•
Identification experimental results
Authentication experimental results
Longitudinal study results
System hierarchical model and parameters
– Hierarchical fallback model
– Outlier parameters
– Number of enrollment samples
– Input text length
– Probability distributions of statistical features
Keystroke Biometric Studies
Experimental Results
Percent Accuracy
Identification Experimental Results
100%
Desk-Copy
Lap-Copy
95%
Desk-Free
Lap-Free
90%
0
20
40
60
80
100
Number of Subjects
Identification performance under ideal conditions
(same keyboard type and input mode, leave-one-out procedure)
Keystroke Biometric Studies
Experimental Results
Percent Accuracy
Identification Experimental Results
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Group 1
Group 2
Group 3
Group 4
Group 5
Group 6
0
20
40
60
80
100
Number of Subjects
Identification performance under non-ideal conditions
(train on one file, test on another)
Keystroke Biometric Studies
Experimental and Data Collection Design
Keystroke Biometric Studies
Experimental Results
Authentication Experimental Results
100%
90%
Percent Accuracy
80%
70%
60%
Performance
50%
FRR
40%
FAR
30%
20%
10%
0%
DeskCopy
LapCopy
DeskFree
LapFree
Conditions
Authentication performance under ideal conditions
(weak enrollment: train on 18 subjects and test on 18 different subjects)
Keystroke Biometric Studies
Experimental Results
Longitudinal Study Results
• Identification – 13 subjects at 2-week intervals
– Average 6 arrow groups: 90% -> 85% -> 83%
• Authentication – 13 subjects at 2-week intervals
– Average 6 arrow groups: 90% -> 87% -> 85%
• Identification – 8 subjects at 2-year interval
– Average 6 arrow groups: 84% -> 67%
• Authentication – 8 subjects at 2-year interval
– Average 6 arrow groups: 94% -> 92%
(all above results under non-ideal conditions)
Keystroke Biometric Studies
Experimental Results
System hierarchical model and parameters
Touch-type hierarchy tree for durations (Mark Ritzmann)
Keystroke Biometric Studies
Experimental Results
System hierarchical model and parameters
Identification accuracy versus outlier removal passes
Keystroke Biometric Studies
Experimental Results
System hierarchical model and parameters
Identification accuracy versus outlier removal distance (sigma)
Keystroke Biometric Studies
Experimental Results
System hierarchical model and parameters
100
Percent Accuracy
95
90
85
80
75
70
1
2
3
4
Enrollment Samples
Identification accuracy versus enrollment samples
Keystroke Biometric Studies
Experimental Results
System hierarchical model and parameters
Identification accuracy versus input text length
Keystroke Biometric Studies
Experimental Results
System hierarchical model and parameters
Distributions of “u” duration times for each entry mode
Keystroke Biometric Studies
Conclusions
• Results are important and timely as more
people become involved in the applications of
interest
– Authenticating online test-takers
– Identifying senders of inappropriate email
• High performance (accuracy) results if
– 2 or more enrollment samples/user
– Users use same keyboard type
Keystroke Biometric Studies
ROC Curves (Robert Zack, 2010)
ROC curves from the kNN classifier with k=21: method m-kNN (left),
method wm-kNN (center), and method hd-kNN (right).
FAR and FRR versus threshold
Closed 14-14 system, kNN classifier with k=21: FAR and FRR versus
threshold for method m-kNN (left), wm-kNN (center), hd-kNN (right).
Conclusions
(Robert Zack, Authentication Study, 2010)
• Keystroke password performance – approximately 10% EER
– See extensive study by Killourhy & Maxion, 2009
– Advertised performance of commercial products is exaggerated
• Keystroke long-text performance – approximately 1% EER
– Reasonable considering powerful statistical features
• Closed system better than open system performance
• Three ROC curve derivation methods developed for kNN
procedure
– All are two-parameter methods – k plus a threshold
Online Test-Taker Authentication
(John Stewart, 2011)
• Best Keystroke Performance – 0.55% EER
– Closed system of 30 students
• Best Previous Keystroke Performance – 1.0% EER
– Closed system of 14 students (Robert Zack, 2010)
• Best Stylometry Performance – approximately 30.0% EER
– Keystroke biometric operates at the automatic motor control level
– Because stylometry operates at a higher cognitive word/syntax level,
longer text passages are required for reasonable performance
• This hypothesis was verified on much longer texts of short novels
Keystroke Data Capture Systems
• Java Applet
– Mary Curtin, Mary Villani, Mark Ritzmann, Robert Zack,
Vinnie Monaco/Ned Bakelman (EISIC paper)
• Java Script (Vinnie Monaco)
– John Stewart / Vinnie Monaco
• Fimbel Open Source Keylogger
– Ned Bakelman / Vinnie Monaco
• Should we develop our own keylogger?
Continual Authentication of Computer Users
(EISIC 2013 Conference Paper)
• Motivation – The technology is applicable to a wide
range of government, private company, and academic
applications worldwide
– For example, to detect intruders, the U.S. Government wants
to continually authenticate all government computer users,
both military and non-military
• U.S. DARPA 2010 and 2012 Requests for Proposals
• Requirement – detect intruder within minutes
• Current study focuses on this fast detection application
– Authentication of students taking online tests
• U.S. Higher Education Opportunity Act of 2008
EISIC 2013
46
Continual Burst Authentication Strategy
Assumptions
• Most computer users tend to have bursts of input
activity interspersed with periods of inactivity while
doing other things
• The application is designed for typical business or
government office computer usage
• Note: it would be interesting to determine the
frequency and duration of bursts of computer input
activity in typical office environments
EISIC 2013
47
Continuous vs Continual Authentication
with Data Capture Windows
• Continuous (ongoing) burst authentication
Burst 1
Burst 2
Burst 3
1
min
1
min
1
min
0
5 min
10 min
• Continual burst authentication with pauses
Burst 1
1
min
0
Burst 2
1
min
Pause
Threshold
8 min
EISIC 2013
Burst 3
1
min
Pause
Threshold
30 min
48
Continual Burst Strategy after Pauses
Reduces Frequency of Authentications
•
•
•
•
Avoids capture of excessive quantities of data
Reduces need for excessive computing resources
Reduces false alarm rate
Still provides sufficient data for continual training of
the biometric system
EISIC 2013
49
Two Important Time Periods
for Continual Burst Authentication
1. Length of the data capture window
– Short enough to catch an intruder before
significant harm is caused
• On the order of minutes – DARPA
– Long enough to make an accurate detection and
reduce false alarms
2. Length of the pause
– Must be shorter than entry time of intruder
– Long enough to reduce authentication rate
Note: periods of little computer activity cause long pauses
EISIC 2013
50
Possible Broader Intrusion Detection Plan
Multi-biometric System
• Motor control level – keystroke + mouse movement
• Linguistic level – stylometry (char, word, syntax)
• Semantic level – target likely intruder commands
Intruder
Semantic
Level
Stylometry
Linguistic
Level
Keystroke + Mouse
EISIC 2013
Motor Control
Level
51
Three Experiments
Dichotomy Model kNN Classification Leave-One-Out Procedure
EISIC 2013
52
Experimental Results
EER versus #Keystrokes
EISIC 2013
53
Experimental Results
ROC Curves at Maximum #Keystrokes
EISIC 2013
54
Keystrokes per Typing Speed
• Average typing speed ~200 keystrokes/min
• Professional typing speed ~400 keystrokes/min
• Therefore, at average typing speed the EER versus
#keystrokes graph goes from about ½ minute to 4
minutes indicating the time to detect an intruder
EISIC 2013
55
Conclusions
(EISIC 2013 Conference Paper)
• As the number of keystrokes per test sample increases,
EER decreased roughly logarithmically
• EER increases with increase in population size
• Performance results of 99.6% on 14, 98.3% on 30, and
96.3% on 119 participants indicates the strong
potential of this approach
EISIC 2013
56
Download