Single-stroke Language-Agnostic Keylogging using Stereo-Microphones and Domain Specific Machine Learning Sashank Narain Amirali Sanatinia Guevara Noubir College of Computer and Information Science Northeastern University 1 Motivation • Side channel attacks escape the security model – Academically pioneered by Paul Kocher’1996 – Timing, power analysis, sound • Global proliferation of mobile smartphones – Estimated 1.75 billion smartphones in 2014 • Used for many day-to-day and business operations • Trusted for sensitive information – Personally Identifiable Information (PII) – Credit Card numbers, Passwords, Location information • Easy target of direct and indirect privacy breaches 2 Outline • Problem & General Attack Scenario • Android Sensors for Keystroke Inference • Related Attacks • Challenges in Keystroke Inference • Our Approach – Using Signal Processing, Designing a Meta-Algorithm • Evaluation Results • Mitigation Techniques 3 The Problem • Sensors on smartphones bypass security mechanisms – Accelerometer, Compass, Gyroscope • Not sandboxed • Do not require explicit permissions • Indirectly leak sensitive information – GPS, Camera & Microphones • Require coarse explicit permissions but contain generic descriptions • Users may ignore permissions • Directly leak sensitive information • Can be accessed at anytime 4 Attack Scenario • Adversary lures victim to install Trojan app – e.g., ‘To-do’ app that supports speech recognition • App records sensor data when user types in Trojan app – Builds training models from collected data • On the phone / On a central server • App invokes service that waits for sensitive activity to start – e.g., Your Favorite Bank Login Page • App records sensor data when sensitive activity – Generates predictions from sensitive data using training models 5 Motion Sensors in Android • Easy to build apps using these APIs • Java methods in Sensor class of Android SDK – C++ functions in sensor.h header of Android NDK • Fixed three dimensional co-ordinate system – Relative to device Android Co-ordinate System • Sensitive to minute motion such as keystrokes 6 Accelerometer • Measures Linear Acceleration + Gravity – Defined as TYPE_ACCELEROMETER • Or obtain sensor fusion data measuring Linear Acceleration – Defined as TYPE_LINEAR_ACCELERATION • Extremely sensitive to motion and very noisy – High-pass filter removes gravity – Low-pass filter removes noise • Used for initial experiments, discarded later on – Gyroscope more stable for Keystroke Inference 7 Gyroscope • Measures rate of rotation in radians / sec – Defined as TYPE_GYROSCOPE • Good for inference – Sensitive to motion but not very noisy – Similar pattern for same keys and different for other keys on x/y axes Similarity between two taps of Character ‘Q’ and two taps of Character ‘V’ 8 Gyroscope (cont.) • To compute rotation: Inc. Angle of Rotation ≈ Rate of Rotation * Sampling Time (dT) • Challenge: Gyroscope Bias & Bias Drift requires correction 9 Stereo-Microphones • Microphone arrays commonplace in modern smartphones – Used for audio enhancements e.g., noise suppression HTC One series support stereo-recording • Ideal for inference – Keystrokes on a soft keyboard can be recorded by microphones – Different amplitudes and time delay for unique keystrokes – Fixed time delay at two microphones for same keys (8 samples for ‘Q’, 15 for ‘V’) Sound waves for Character ‘Q’ and ‘V’ taps 10 Stereo-Microphones (cont.) • Delay in tap detection between two microphones (M1, M2) Number of Samples = (Distance(T, M1) – Distance(T, M2)) * Sampling Rate / Speed of Sound • For the HTC One – – – – Distance between microphones: 0.134 m Maximum supported sampling rate: 48 KHz Speed of sound in air: 340 m / s Difference of +19 samples to -19 samples • For future devices with higher sampling rate – Example sampling rate: 192 Khz – Difference of 2*75 samples for tap close to one microphone 11 Related Work (Attacks) • First work by Cai & Chen 2011 – – – – Demonstrated feasibility of inference using the Orientation sensor Developed Android application called ‘TouchLogger’ Accuracy tested on Number only keypad in Landscape mode Successful inference accuracy of 70% on 3 data-sets 12 Related Work (cont.) • Owusu et al. 2012 – – – – QWERTY in Landscape mode, Area Inference Developed Android app called ‘ACCessory’ Data-sets on HTC ADR 6300 phone from 4 users Successfully inferred 6 character passwords • 6 passwords out of 99 in 4.5 trials • Estimated 59 passwords out of 99 in 215 trials • Xu, Bai & Zhu 2012 – Lock screen password and numbers during call • E.g., Credit Card and PIN numbers – Used two sensors, Accelerometer for tap detection & Orientation for inference – Developed Android app called ‘TapLogger’ – Data-sets on HTC Aria and Google Nexus (One) phones from 3 users – Achieved: 50% for 1 guess and high accuracy for top 3 13 Related Work (cont.) • Aviv et al. 2012 • PIN numbers and pattern passwords inference • Used the Accelerometer sensor for inference • Data-sets on Nexus One, G2, Nexus S and Droid Incredible from 24 users in two settings • Controlled (Seated) and Uncontrolled (Walking) • Accuracy of 43% and 73% on PIN and pattern passwords respectively, within 5 attempts 14 Related Work (cont.) • Miluzzo et al. 2012 – QWERTY in Landscape mode and Icon in Portrait mode inference – Used Accelerometer and Gyroscope sensor combined with Ensemble learning – Presented a framework called ‘TapPrints’ – Datasets on Google Nexus S, Samsung Galaxy Tab 10.1, iPhone 4 – Icon locations inferred with 79% and 65% accuracy for the iPhone and Google Nexus S, resp. – Characters inferred with 65% accuracy – Some icons or characters inferred with accuracy of up to 90% and 80%, respectively 15 Challenges • Gyroscope – Noise • Typing with trembling hands • Typing in different environments e.g., inside a car – Soft Touch • User taps too soft to induce vibrations – Gyroscope Drift and Bias • Stereo-Microphones – Noise • Typing in an environment with lot of background noise • Typing in different environments with different noise levels – Soft Touch • User tap sounds don’t reach microphones 16 Our Approach • Use a combination of sensors – Accelerometer (initially) + Gyroscope + Stereo-Microphones • Use signal processing and richer data instead of features – Complementary filter combining Accelerometer and Gyroscope and bandpass filter to remove Gyroscope drift and noise – Bandpass filter [1.5 - 3.5 KHz] to reduce audio noise Gyroscope Filtering Microphones Filtering 17 Our Approach (cont.) • Use a specialized multi-level Meta-Algorithm – – – – Use several machine learning algorithms and combine results Create training models for individual characters Create training models for specific keyboard areas Make predictions on areas, then on individual keys in area Area Division 18 Elementary Algorithms • Machine learning algorithms – Supervised classification – Selected: Decision Trees (DT), Naïve Bayes (NB), k-Nearest Neighbor (k-NN) – Not selected: Hidden Markov Models, Support Vector Machines, Random Forest, Neural Networks 19 The Meta-Algorithm Area Selection Individual Models Area Models Voting Models 20 Comparison to Previous Work • Use stereo-microphones for keystroke inference • Combine sensor and acoustics for keystroke inference • Use of richer processed sensor and audio data instead of extracting features • Use a multi-layer multi-algorithm approach based on the specifics of Android keyboard • Addresses smaller keyboard dimensions e.g., standard QWERTY keyboard exceeding 90% prediction accuracy • Demonstrating end to end attack feasibility 21 Evaluation System • Hardware – HTC One (Android 4.4) , Samsung S2 & Tab 8 (Android 4.1) – No modifications to OS • Evaluation Application – Collects datasets for training and evaluation – Custom keyboard for training with same layout as standard keyboard – QWERTY & Numerical; Portrait & Landscape • Datasets – 7 participants – 5 in office; 2 in restaurant (-2 unusable) 22 Evaluation Metrics • Performance of Meta-Algorithm – Of different sensors for different areas – As compared to elementary use of algorithms • End-to-end Attack – For sensor data collected by Trojan app from sensitive apps 23 Evaluation (Meta-Algorithm) • Gyroscope results location dependent – Areas further from gyroscope result in more rotation – Easy to Infer • Microphones results typically location independent – Infer mostly based on speed of sound • The two could be combined to boost inference accuracy – When both data are not noisy E.g., HTC One QWERTY 24 Evaluation (cont.) (Meta-Algorithm) • Substantial increase in accuracy in comparison to elementary use of algorithms Accuracy of samples using elementary algorithms Accuracy of samples using Meta-algorithm 25 Evaluation (cont.) (Meta-Algorithm) • Possible to achieve > 90% for QWERTY keyboard • Possible to achieve > 95% for Number keyboard • Some sample sets between 44-56% – Noise > 70dB – Gyroscope Drift • Soft Touch sets < 20% 26 Evaluation (End-to-End Attack) • Collected on banking app with fake numbers – Every UI page is known as an activity – Trojan queries for the foreground activity every 5s • 100 four digit PIN numbers – 376 out of 400 digits predicted correct (94%) – 84 predicted completely correct • 100 sixteen digits Credit Card numbers – 1467 out of 1600 digit predicted correct (91.5%) – 52 predicted completely correct 27 Mitigation Techniques • Sensors bypass Android security model (Sandboxing and Permissions) – Gyroscope sensor • Is not sandboxed • Does not require explicit permissions – Microphones • Requires explicit permissions but contain generic descriptions – No dynamic control • One Technique: Blocking – Obtain lock on mutually exclusive sensors and hardware – Invoke the Microphones or Camera to deny access to other apps – FlaskDroid [Bugiel et al. 2013] 28 Mitigation Techniques (cont.) • Alternative Technique: Limiting Access – Blocking ineffective against Gyroscope sensor • They are not-mutually exclusive – Observation: sampling rate affects Inference capability – Solution: Reduce the sampling rate for background apps to a low but acceptable level 29 Conclusions • Stereo-microphones + gyroscope keyloging predictions can exceed 90% accuracy • Implications of mobile phone sensors on privacy still not well understood – Need for better privacy models in devices loaded with side channels • Mitigations at all layers of the stack 30