Keyboard Acoustic Emanation Revisited

advertisement
• Security problems of your keyboard
– Authentication based on key strokes
– Compromising emanations consist of electrical,
mechanical, or acoustical
– Supply chain attack (Bluetooth, SD card)
– Power usage?
1
• Key stroke biometrics with number-pad input (DSN
2010)
– 28 users typed the same 10 digit number
– Use statistical machine learning techniques
– Detection rate 99.97%
– False alarm rate 1.51%
– Can be used for real life two-factor authentication
2
Keyboard Acoustic Emanations Revisited
Li Zhuang, Feng Zhou and J. D. Tygar
U. C. Berkeley
Motivation
• Emanations of electronic devices leak information
• How much information is leaked by emanations?
• Apply statistical learning methods to security
– What is learned from recordings of typing on a keyboard?
4
Keyboard Acoustic Emanations
• Leaking information by acoustic emanations
Alice
password
5
Acoustic Information in Typing
• Frequency information in sound of each typed key
• Why do keystrokes make different sounds?
– Different locations on the supporting plate
– Each key is slightly different
• [Asonov and Agrawal 2004]
6
Timing Information in Typing
• Time between two keystrokes
• Lasting time of a keystroke
• E.g. [Song, Wagner and Tian, 2001]
7
Previous Work vs. Our Approach
Asonov and Agrawal
Ours
Requirement
Text-labeling
Direct recovery
Analogy in Crypto
Known-plaintext attack
Known-ciphertext attack
Feature Extraction
FFT
Cepstrum
Supervised learning
Clustering (K-means,
with Neural Networks
Gaussian), EM algorithm
Language Model
/
HMMs at different levels
Feedback-based Training
/
Self-improving feedback
Initial training
8
Key Observation
• Build acoustic model for keyboard & typist
• Non-random typed text (English)
– Limited number of words
– Limited letter sequences (spelling)
– Limited word sequences (grammar)
• Build language model
– Statistical learning theory
– Natural language processing
9
Overview
Initial training
Subsequent recognition
wave signal
wave signal
Feature Extraction
Feature Extraction
Unsupervised Learning
Keystroke Classifier
Language Model Correction
Language Model Correction
(optional)
Sample Collector
Classifier Builder
recovered keystrokes
keystroke classifier
recovered keystrokes
10
Feature Extraction
Initial training
Subsequent recognition
wave signal
wave signal
Feature Extraction
Feature Extraction
Unsupervised Learning
Keystroke Classifier
Language Model Correction
Language Model Correction
(optional)
Sample Collector
Classifier Builder
recovered keystrokes
keystroke classifier
recovered keystrokes
11
Sound of a Keystroke
• How to represent each keystroke?
– Vector of features: FFT, Cepstrum
– Cepstrum features used in speech recognition
12
Cepstrum vs. FFT
• Repeat experiments from [Asonov and Agrawal 2004]
Linear Classification
Neural Networks
Gaussian Mixtures
accuracy
1
0
Training
Test 1
Test 2
Training
Test 1
Test 2
Training
Test 1
Test 2
13
Unsupervised Learning
Initial training
Subsequent recognition
wave signal
wave signal
Feature Extraction
Feature Extraction
Unsupervised Learning
Keystroke Classifier
Language Model Correction
Language Model Correction
(optional)
Sample Collector
Classifier Builder
recovered keystrokes
keystroke classifier
recovered keystrokes
14
Unsupervised Learning
• Group keystrokes into N clusters
– Assign keystroke a label, 1, …, N
• Find best mapping from cluster labels to characters
• Some character combinations are more common
– “th” vs. “tj”
– Hidden Markov Models (HMMs)
15
Bi-grams of Characters
“t”
5
“h”
11
“e”
2
• Colored circles: cluster labels
• Empty circles: typed characters
• Arrows: dependency
16
Language Model Correction
Initial training
Subsequent recognition
wave signal
wave signal
Feature Extraction
Feature Extraction
Unsupervised Learning
Keystroke Classifier
Language Model Correction
Language Model Correction
(optional)
Sample Collector
Classifier Builder
recovered keystrokes
keystroke classifier
recovered keystrokes
17
Word Tri-grams
• Spelling correction
• Simple statistical model of English grammar
• Use HMMs again to model
18
Two Copies of Recovered Text
Before spelling
and grammar
correction
After spelling
and grammar
correction
_____ = errors in recovery
= errors in corrected by grammar
19
Sample Collector
Initial training
Subsequent recognition
wave signal
wave signal
Feature Extraction
Feature Extraction
Unsupervised Learning
Keystroke Classifier
Language Model Correction
Language Model Correction
(optional)
Sample Collector
Classifier Builder
recovered keystrokes
keystroke classifier
recovered keystrokes
20
Feedback-based Training
Initial training
Subsequent recognition
wave signal
wave signal
Feature Extraction
Feature Extraction
Unsupervised Learning
Keystroke Classifier
Language Model Correction
Language Model Correction
(optional)
Sample Collector
Classifier Builder
recovered keystrokes
keystroke classifier
recovered keystrokes
21
Feedback-based Training
• Recovered characters
– Language correction
• Feedback for more rounds of training
• Output: keystroke classifier
– Language independent
– Can be used to recognize random sequence of keys
• E.g. passwords
– Representation of keystroke classifier
• Neural networks, linear classification, Gaussian mixtures
22
Keystroke Classifier
Initial training
Subsequent recognition
wave signal
wave signal
Feature Extraction
Feature Extraction
Unsupervised Learning
Keystroke Classifier
Language Model Correction
Language Model Correction
(optional)
Sample Collector
Classifier Builder
recovered keystrokes
keystroke classifier
recovered keystrokes
23
Experiment (1)
• Single keyboard
– Logitech Elite Duo wireless keyboard
– 4 data sets recorded in two settings
• Quiet & noisy
• Keystrokes are clearly separable from consecutive keys
– Automatically extract keystroke positions in the signal with
some manual error correction
24
– Data sets
Recording length
Number of words
Number of keys
Set 1
~12 min
~400
~2500
Set 2
~27 min
~1000
~5500
Set 3
~22 min
~800
~4200
Set 4
~24 min
~700
~4300
Initial & final recognition rate
Set 1 (%)
Set 2 (%)
Set 3 (%)
Set 4 (%)
Word
Char
Word
Char
Word
Char
Word
Char
Initial
35
76
39
80
32
73
23
68
Final
90
96
89
96
83
95
80
92
25
Experiment (2)
• Multiple Keyboards
– Keyboard 1: DELL QuietKey PS/2, P/N: 2P121
• In use for about 6 months
– Keyboard 2: DELL QuietKey PS/2, P/N: 035KKW
• In use for more than 5 years
– Keyboard 3: DELL Wireless Keyboard, P/N: W0147
• New
26
• 12-minute recording with ~2300 characters
Keyboard 1 (%)
Keyboard 2 (%)
Keyboard 3 (%)
Word
Char
Word
Char
Word
Char
Initial
31
72
20
62
23
64
Final
82
93
82
94
75
90
27
Experiment (3)
• Classification methods in feedback-based training
– Neural Networks (NN)
– Linear Classification (LC)
– Gaussian Mixtures (GM)
100
90
80
70
60
NN
LC
GM
50
40
30
20
10
0
Word
Char
28
Limitations of Our Experiments
• Considered letters, period, comma, space, enter
• Did not consider numbers, other punctuation,
backspace, shift, etc.
• Easily separable keystrokes
• Only considered white noise (e.g. fans)
29
Defenses
• Physical security
• Two-factor authentication
• Masking noise
• Keyboards with uniform sound (?)
30
Summary
• Recover keys from only the sound
• Using typing of English text for training
• Apply statistical learning theory to security
– Clustering, HMMs, supervised classification, feedback
incremental learning
• Recover 96% of typed characters
31
Download