Tracking Sensitive Data in Real World Systems Keith Harrison, Dr. Shouhuai Xu Introduction Knowing where sensitive data, such as private cryptographic keys, propagates to within a system is important! Real world hardware and software is very complex. In order to ensure that software and hardware are taking the necessary steps to minimize security risks a tool is needed to examine where exactly sensitive data propagates to The Problem Sensitive data may be compromised a number of ways Application Vulnerabilities Hardware/Kernel Vulnerabilities Insecure System Setup Poor physical security Application Vulnerabilities Buffer Overflows Heap Overflow Format String Vulnerabilities Many possibilities… Hardware Kernel Vulnerabilities Could allow an unprivileged process to read data from another process space Kernel or Kernel module bug Hardware bug Insecure System Setup Poor file permissions “Would you like Windows to remember this password for you?” Automatic Logins Poor Physical Security An unprivileged user may have unsupervised physical access to the box Use of boot disks Non-password protected bios What to do? The naïve approach would be to hide the sensitive data. But this doesn’t work! In “Playing ‘Hide and Seek’ with Stored Keys” by Adi Shamir and Nicko van Someren. They show that hidden private keys can easily be found even if the public key is not known by identifying sections of high entropy. Take Preventative Action! Limit the exposure of sensitive data as much as possible so that if an adversary finds a vulnerability the chance of compromising the sensitive data is as low as possible. How? Reduce the number of copies of the sensitive data on disk and in memory as much as possible. Use mlock() to avoid sensitive data being swapped out. Make sure and clear the memory before releasing it back to the operating system. This is still not enough. It would be very hard to guarantee that every program that handles sensitive data an your system will do this. Even if a program, in its source code, zero’s memory before releasing it to the operating system compiler optimizations may omit the instruction to do this in favor of speed. In some High level languages like python or java you may not be able to zero a list or a string because a copy may be made without your knowledge. First Step As a first step in understanding the problems associated with protecting sensitive data in real world systems we need to find a way to analyze where in the system the sensitive data propagates to. Many possibilities: registers, cache, hard disk, memory, swap space. Previous and Concurrent Work All previous and concurrent work found in this area make use of a whole system simulator such as the open source Bochs IA32 Emulator used by Jim Chow et al in “Understanding Data Lifetime via Whole System Simulation” But what about a real system? Any simulator, no matter how clever, may not precisely duplicate a real system. Application developers and home users would not want to run a slow simulator in order to see what's going on inside their application. I focus on memory, hard drive, and swap space since those are the most vulnerable in real systems. Naïve Approach: Directly read the device i.e. /dev/hda1 Directly read through the file system Read from /dev/mem or /proc/kcore But this is a very bad idea because reading via a file descriptor in any manner will inadvertently create copies of what your reading (i.e. the sensitive data) in memory. This won’t help us understand what’s really going on! RSA private keys Say for example we wanted analyze the data lifetime of a RSA private key for Apache or OpenSSH. The key exists in only one place in the file system so this part is trivial. But what about memory and swap space? Analyzing Memory Write two kernel modules! One kernel module will read physical memory directly via a pointer and find where exactly in physical memory this data exists. The other kernel module will find all physical memory pages accessible by all process. Correlating the results of these two modules we can see which processes have access to the sensitive data and if any sensitive data exists which is not accessible by any process. Analyzing Swap Read the device directly, however this interferes with memory so we are forced to analyze swap space separately from the analysis of memory Simulations In order to test the tool simulations were run on both Apache and OpenSSH OpenSSL is used by both Apache and OpenSSH An Isolated LAN was used for the simulations Linux 2.6 Kernel Gentoo Distribution Simulation Outline T1 – Daemon Started T4 – Transactions Started T7 – Transactions Increased T10 – Transactions Stopped T13 – Daemon Stopped T16 – Free Memory Cleared T19 – Simulation End Apache OpenSSH Swap In all recent simulations run the RSA private key has never appeared in swap space. This is most likely because the RSA private key is frequently used making it a poor choice to swap to disk. Advantages of this Approach Can monitor location of sensitive data in real time. No special software or hardware required. Only thing required is kernel module support which is default for almost every Linux distribution. Limitations of this Approach Our algorithm is fast but it still takes anywhere from 5-30 seconds to run depending on the amount of memory we have and the number of processes running. Since we can not, and would not want to stop context switches in this time our analysis can not be counted on to be perfectly accurate. But its still a Great Tool for visualizing and analyzing what exactly is going on! Continuing Work Find Why the RSA private key is not being erased before it is being released back to the operating system and correct this! Implement in-kernel algorithms to better remove sensitive data that is not properly cleared before being released back to the operating system as outlined in “Shredding Your Garbage: Reducing Data Lifetime Through Secure Deallocation” by Jim Chow et al. Future Work Find a way to access swap space from kernel space. Perhaps there is a method to reverse lookup which processes have access to which memory pages, this would speed the algorithm considerably making it much more accurate. Conclusion Tracking Data Lifetime is Important! Simulators are a useful tool, but they are not a REAL system. Real world systems may behave differently! Apache and OpenSSH (or OpenSSL) don’t take appropriate measures to erase sensitive data before releasing it to the operating system.