Tracking Sensitive Data in Real World Systems Keith Harrison, Dr. Shouhuai Xu

advertisement
Tracking Sensitive Data in Real
World Systems
Keith Harrison,
Dr. Shouhuai Xu
Introduction



Knowing where sensitive data, such as private
cryptographic keys, propagates to within a system is
important!
Real world hardware and software is very complex.
In order to ensure that software and hardware are
taking the necessary steps to minimize security risks
a tool is needed to examine where exactly sensitive
data propagates to
The Problem





Sensitive data may be compromised a
number of ways
Application Vulnerabilities
Hardware/Kernel Vulnerabilities
Insecure System Setup
Poor physical security
Application Vulnerabilities




Buffer Overflows
Heap Overflow
Format String Vulnerabilities
Many possibilities…
Hardware Kernel Vulnerabilities



Could allow an unprivileged process to read
data from another process space
Kernel or Kernel module bug
Hardware bug
Insecure System Setup



Poor file permissions
“Would you like Windows to remember this
password for you?”
Automatic Logins
Poor Physical Security



An unprivileged user may have unsupervised
physical access to the box
Use of boot disks
Non-password protected bios
What to do?



The naïve approach would be to hide the
sensitive data.
But this doesn’t work!
In “Playing ‘Hide and Seek’ with Stored Keys”
by Adi Shamir and Nicko van Someren. They
show that hidden private keys can easily be
found even if the public key is not known by
identifying sections of high entropy.
Take Preventative Action!

Limit the exposure of sensitive data as much
as possible so that if an adversary finds a
vulnerability the chance of compromising the
sensitive data is as low as possible.
How?



Reduce the number of copies of the sensitive
data on disk and in memory as much as
possible.
Use mlock() to avoid sensitive data being
swapped out.
Make sure and clear the memory before
releasing it back to the operating system.
This is still not enough.



It would be very hard to guarantee that every
program that handles sensitive data an your system
will do this.
Even if a program, in its source code, zero’s memory
before releasing it to the operating system compiler
optimizations may omit the instruction to do this in
favor of speed.
In some High level languages like python or java you
may not be able to zero a list or a string because a
copy may be made without your knowledge.
First Step


As a first step in understanding the problems
associated with protecting sensitive data in
real world systems we need to find a way to
analyze where in the system the sensitive
data propagates to.
Many possibilities: registers, cache, hard
disk, memory, swap space.
Previous and Concurrent Work

All previous and concurrent work found in
this area make use of a whole system
simulator such as the open source Bochs
IA32 Emulator used by Jim Chow et al in
“Understanding Data Lifetime via Whole
System Simulation”
But what about a real system?



Any simulator, no matter how clever, may not
precisely duplicate a real system.
Application developers and home users
would not want to run a slow simulator in
order to see what's going on inside their
application.
I focus on memory, hard drive, and swap
space since those are the most vulnerable in
real systems.
Naïve Approach:





Directly read the device i.e. /dev/hda1
Directly read through the file system
Read from /dev/mem or /proc/kcore
But this is a very bad idea because reading via a file
descriptor in any manner will inadvertently create
copies of what your reading (i.e. the sensitive data)
in memory.
This won’t help us understand what’s really going on!
RSA private keys



Say for example we wanted analyze the data
lifetime of a RSA private key for Apache or
OpenSSH.
The key exists in only one place in the file
system so this part is trivial.
But what about memory and swap space?
Analyzing Memory




Write two kernel modules!
One kernel module will read physical memory
directly via a pointer and find where exactly in
physical memory this data exists.
The other kernel module will find all physical memory
pages accessible by all process.
Correlating the results of these two modules we can
see which processes have access to the sensitive
data and if any sensitive data exists which is not
accessible by any process.
Analyzing Swap

Read the device directly, however this
interferes with memory so we are forced to
analyze swap space separately from the
analysis of memory
Simulations





In order to test the tool simulations were run
on both Apache and OpenSSH
OpenSSL is used by both Apache and
OpenSSH
An Isolated LAN was used for the simulations
Linux 2.6 Kernel
Gentoo Distribution
Simulation Outline







T1 – Daemon Started
T4 – Transactions Started
T7 – Transactions Increased
T10 – Transactions Stopped
T13 – Daemon Stopped
T16 – Free Memory Cleared
T19 – Simulation End
Apache
OpenSSH
Swap


In all recent simulations run the RSA private
key has never appeared in swap space.
This is most likely because the RSA private
key is frequently used making it a poor
choice to swap to disk.
Advantages of this Approach



Can monitor location of sensitive data in real
time.
No special software or hardware required.
Only thing required is kernel module support
which is default for almost every Linux
distribution.
Limitations of this Approach



Our algorithm is fast but it still takes anywhere from
5-30 seconds to run depending on the amount of
memory we have and the number of processes
running.
Since we can not, and would not want to stop
context switches in this time our analysis can not be
counted on to be perfectly accurate.
But its still a Great Tool for visualizing and analyzing
what exactly is going on!
Continuing Work


Find Why the RSA private key is not being erased
before it is being released back to the operating
system and correct this!
Implement in-kernel algorithms to better remove
sensitive data that is not properly cleared before
being released back to the operating system as
outlined in “Shredding Your Garbage: Reducing Data
Lifetime Through Secure Deallocation” by Jim Chow
et al.
Future Work


Find a way to access swap space from
kernel space.
Perhaps there is a method to reverse lookup
which processes have access to which
memory pages, this would speed the
algorithm considerably making it much more
accurate.
Conclusion



Tracking Data Lifetime is Important!
Simulators are a useful tool, but they are not
a REAL system. Real world systems may
behave differently!
Apache and OpenSSH (or OpenSSL) don’t
take appropriate measures to erase sensitive
data before releasing it to the operating
system.
Download