Uploaded by Abdulrhman Alshameri

Security Applications For Emulation

advertisement
Security Applications For
Emulation
silvio.cesare@gmail.com
Speaker details



An independent researcher.
Presented a number of vulnerabilities at the first Ruxcon after auditing the
opensource kernels (FreeBSD, NetBSD, Linux, OpenBSD)
Also interested in Reverse Engineering, speaking at CanSecWest on Linux
malware.
Outline



A Presentation examining public research, and the results of my own
research, on the topic of emulation applied to security.
Technology review
Security applications for emulation
 Reverse engineering Cisco IOS Heap Management
 Tracing and evaluating the capabilities of binaries
 Dynamic Taint Analysis
 Automated unpacking
 Symbolic Execution
 Detecting Runtime Errors in Programs
 And introducing a new tool for the detecting out of bounds heap
access in the Linux Kernel
Virtualization



Different technologies all sharing similar themes
 Virtualization
 Emulation
 Dynamic Binary Translation
Different types of virtualization
 Full Virtualization provides a simulation of the underlying hardware
 Host performs native execution of the guest as much as possible.
 Not an emulator, so aiming for near native speeds.
 In i386, if there isn't full virtualization hardware support,
privileged code is translated
 Eg VMWare, VirtualBox
Virtualization is an important technology, but this presentation focuses on
the host being able to intercept and emulate each individual instruction in
the guest. This is in contrast to virtualization, which executes guest code
natively as much as possible, with little general host interception.
Emulation and Dynamic Binary
Translation


Emulation
 Emulator Fetches, Decodes and Executes instruction by instruction
 Different types of emulators: whole system emulators capable of
running unmodified guest operating systems, or emulators only
capable of running applications on specific systems.
 Guest state is maintained in software, including the CPU, system
memory, and for whole system emulators, hardware devices.
 Eg Bochs
 Used in the open source automated unpacker, Pandora's Bochs.
Dynamic Binary Translation
 A faster form of emulation
 Caches blocks of decoded and translated instruction
 Eg QEMU
 Used in Argos, a system for capturing 0day*
 Used in my MemCheck tool for detecting Linux kernel heap
access bugs*.
Dynamic Analysis and Emulation





An emulator can be used to implement dynamic analysis.
Dynamic Analysis means running a program and seeing whats going on as
it executes, eg as in a debugger
 It can mean identifying specific behaviors in the program, such as how
the program accesses memory, transfers execution control, or treats
network data.
Dynamic analysis using a debugger is prone to anti-debugging tricks, and
is very cumbersome when applied in a kernel context.
A robust solution is to perform dynamic analysis from inside an emulator.
 Hooks are added in the fetch/decode/execute loop of an emulator.
 When modifying a dynamic binary translator generally,
instrumentation or callbacks are added to the translated code blocks.
All the applications for emulation presented, are related to or applications
of dynamic analysis.
Part i)
Reverse Engineering Cisco IOS's
Heap Management
Reverse Engineering Cisco IOS
with Dynamips




Dynamips is an open source emulator and binary translator of Cisco
hardware running PPC/MIPS IOS images.
 Potential future development environment for IOS exploits.
 Dynamic analysis of IOS*
My experience is with IOS on MIPS
 IOS MIPS images use an invalid ELF e_machine field.
 Some IDA (5.2) bugs with MIPS (turn off macros to workaround).
Dynamic analysis, can identify heap management functions in IOS and
provide a means to potentially implement Valgrind style heap checkers.
 It can also be used to reverse engineer other components of IOS.
Dynamic analysis is different to the static approach, and has some
advantages
 Can be completely automated
 Since the behavior of the IOS implementation is relatively constant
this method can work across different IOS images, providing new or
obsolete features aren't being examined
IOS Heap Management Basics




Well documented public research in developing heap based buffer
overflow exploits describes general heap layout.
IOS heap allocated buffers have a header appearing directly before the
buffer, and a trailer that follows the buffer.
These 'chunks' form a doubly linked list.
Chunk header begins with a known constant
 This fact is used later in the analysis.
Dynamic Analysis Approach



Knowing the header constant of a malloc chunk enables us to track
memory allocations by intercepting writes to memory of that particular
constant.
Heap management is slightly different in a kernel but a kernel or user
mode alloc/free still has a set of expected semantics and prototypes.
 An alloc(ation) function returns a pointer to an allocated buffer.
 But don't expect there only to be one argument of the allocation size,
eg kmalloc in Linux has multiple arguments including flags.
 Free might have multiple arguments also, but one of those arguments
is certainly a pointer to an allocated buffer.
By tracking allocations, and checking the behavior of functions, we can
infer the locations of malloc and free.
Identifying Functions with Dynamic
Analysis


Finding malloc
 Track writes to memory that write the constant that identifies a malloc
chunk.
 Track procedures exits, checking the return value for a pointer to a
known allocated buffer. This return value is the chunk location +
chunk header length.
 First function to return allocated buffer is malloc, but sample a number
of times to be sure.
Finding free
 Find two malloc calls that return the same memory
 Free must have occurred between mallocs since logically, allocated
buffers can't overlap.
 Track procedure calls with an argument matching freed memory, eg
free(ptr)
 Sample large enough set, common function among samples is free.
Testing the results with a double
free and overlapping allocation
checker.




How can we determine if malloc and free are the only heap management
functions.
The solution is to trace those functions while running IOS, building our
own representation of the heap, all the while checking for consistency in
our representation.
Certain conditions should always be true in a well managed heap. If any
assertions fail catastrophically, our model of the heap is incorrect.
 Only allocated memory can be freed.
 Allocated memory can not overlap.
This results in a checker that can be used to detect double free bugs in
IOS, as they happen, much like Valgrind. But IOS checks the consistency
of the heap regularly and also during free, so the checker is probably only
useful for automated analysis.
Detecting IOS 0-day





Another type of IOS checker could potentially be made to detect 0-day
attacks.
IOS exploitation uses corrupted malloc chunks that are subsequently
freed.
 Freeing the corrupt chunk causes an arbitrary write to memory.
The checker could confirm the consistency of header attributes such as the
size of each chunk through the interception of free calls.
For more complete coverage, the chunk header could be retrieved and
stored after every malloc, subsequently being verified before free.
In a roll-out, honeypots could automatically detect mass 0-day
exploitation and raise alarms of the attack.
Reference Counting.





Tracing malloc and free, shows us conditions where we are freeing the
same memory twice, or performing a double free.
Potentially this could indicate a bug in IOS but there are simply too many
alerts to be meaningful.
In fact, it turns out that as suspected by other researchers, allocated buffers
are reference counted
Before the two double frees is a call to increment the reference count
(IncRefCnt) of the buffer, thus causing the first free to simply decrement
the count without actually freeing the memory.
MIPS has an atomic addition instruction, used only for incrementing the
malloc chunk refcnt.
 Any procedure that uses this instruction on a malloc chunk is
IncRefCnt.
 For other architectures, the refcnt field in the malloc chunk is at a
fixed offset, and writes to this address may also indicate the location
of IncRefCnt.
MallocLite






Tracing also reveals the appearance of overlapping memory allocations.
In later versions of IOS, 'MallocLite' implementation is used.
A 64k allocation is used which is subsequently subdivided for use in
allocations <= 128 bytes.
This feature may affect the writing of heap exploits and should be taken
into account.
If malloc recursively calls itself, requesting 64k of memory, then
MallocLite is allocating this larger block of memory.
For tracing, ignoring recursive allocations works.
Cisco IOS TODO




The malloc tracer could potentially be used to implement a Valgrind style
MemCheck tool to detect out of bounds heap access.
 This could be used alongside fuzzing to provide more accurate
detection of vulnerabilities when they happen.
Easy to implement, but the initial attempt resulted in too many false
positives.
Problem: There are other functions that have direct access to internal heap
structures besides malloc, free and IncRefCnt, eg CheckHeaps.
 More reversing is required.
 If Cisco gave me access to the source, I'm pretty sure I could whack
this out in a week ;-)
The MemCheck concept was later successfully implemented for the Linux
Kernel as source code is openly available.
Cisco IOS Summary



By modifying the open source Cisco emulator, dynamips, dynamic
analysis of IOS is possible.
Dynamic Analysis of IOS can aid in reverse engineering.
Potentially one day we will have Valgrind style IOS memory checking
tool, or in the near future a 0-day detection tool.
Part ii)
Tracing execution and evaluating
the capabilities of binaries and
potential malware
Tracing and evaluating the
capabilities of binaries





Running binary inside a sandboxed environment logging events of
interest.
 System calls, registry changes, files accessed, process management,
services started or stopped etc.
Public websites offer free online services to evaluate binaries and potential
malware.
Trace useful for quickly determining what a binary is doing.
 May help in determining if binary is malicious.
A non emulated approach is to trace the binary using a debugger based
tool from userspace within a VM.
 Malware almost certain to use anti debugging tricks which may make
tracing problematic.
Another approach is to perform the execution inside an emulator.
 Emulated approach very resistant to modern anti-debugging tricks.
TTAnalyze






TTAnalyze: A Masters thesis that presented a closed source fork of QEMU
that logged windows system calls.
Important as other techniques such as automated unpacking are based on
similar methods and the thesis clearly describes the implementation.
Windows XP running as a guest, emulated by a fork of QEMU in the host.
Host uploads binary to guest using virtual network created by VM.
Binary is executed in guest environment.
Host monitors execution and logs events of interest.
TTAnalyze concepts




Host emulator intercepts every instruction.
It identifies instructions that belong to the process being monitored.
 How to know what code is part of the process we wish to monitor?
 CR3 register (the page directory base address) is unique for each
process.
 Kernel maintains a process list (EPROCESS) with these addresses.
 Given a specific process instruction, it may be executing either kernel
code or user code.
 For our target process, kernel code is when EIP > 0x80000000.
For the target process, it checks EIP, and if it points to a Windows API call
it logs the event.
It also logs returning from Windows API calls.
 To know the addresses of each Windows API call, it uses the PEB
from the target process used to eventually retrieve a list of all loaded
DLL's.
 The library calls in each DLL is parsed, and their addresses noted.
TTAnalyze Implementation



A component that executes inside the guest system
 Kernel driver to parse kernel EPROCESS list, to obtain the page
directory address (CR3), and PEB of the target process.
RPC mechanism to control guest operations from host
 uploading executables to guest
 Controlling execution of the target process, which is initially started in
a suspended state to allow querying.
 Querying the pdb/CR3 and PEB kernel driver.
QEMU modifications
 Identifying the process of interest using the CR3 result from the guest
kernel driver.
 The PEB is used to established a list of addresses for each windows
API call in a DLL*
 Identifying entering and leaving windows API calls in the guest, based
on intercepting each instruction and checking EIP.
TTAnalyze Implementation
Challenges


Arguments for system calls which reside in virtual memory might be
paged out.
 QEMU page fault handler detects condition then alters guest code to
access target memory, paging it in.
Malware can use the Native API directly.
 Understanding this requires unofficial documentation of API.
 Trap native calls by checking each instruction for a OS trap (int 2e or
sysenter).
TTAnalyze Attacks



Malware might evade detection of Windows API calls which is dependant
on exact EIP matching.
 Vulnerable if malware doesn't jump to the very beginning of a
function, eg Caller might implement callee prologue
Malware might detect guest changes.
 Communication channel between host and guest.
 Kernel driver component.
 See Pandora's Bochs (An automated unpacker) implementation with
no guest changes.
Malware might detect system emulators
 CPU Bugs (in errata) generally not implemented
 Model Specific Registers implementation different for different CPU
vendors.
Binary Tracing Summary


Existing software that traces binaries using a userland style debugger
based tool in a VM, vulnerable to many anti-debugging tricks.
An emulator can present a solution to that problem.
Part iii)
Using emulation for dynamic taint
analysis
Dynamic Taint Analysis





A technique used to analyze the the flow of data in a program.
 Has applications in identifying vulnerabilities as they happen, eg Argos.
 Has also been used to identify spyware, eg, BitBlaze.
 Is a general concept that can be used in a number of applications,
including symbolic execution.
Traces the flow of data, instruction by instruction, from a source that
generates 'tainted' data, to sinks where the data is used.
Variables, registers and memory are tagged as being tainted or clean.
Destination operand in instruction becomes tainted when a source operand is
tainted.
Sometimes its useful that data can become untainted by certain operations.
Dynamic Taint Analysis in
Vulnerability Detection






Dynamic Taint Analysis has been applied for vulnerability detection such
as SQL injection, or incorrect use of the Unix exec*() or system() calls
which run executables.
Source of user input, that is untrusted data, taints the data.
Flow of untrusted data followed by taint analysis.
If untrusted data checked in a condition, then input validation deemed to
have occurred, so untaint data.
At site of exec*(), system(), or even mysql_query, check that argument is
non tainted.
If tainted, then untrusted data assumed to have reached privileged code
and vulnerability has occurred.
Argos: A tool for detecting 0day
attacks






Uses dynamic taint analysis to detect 0day attacks.
An open source fork of QEMU.
Detects exploits as they are happening and automatically generates
vulnerability signatures.
Vision is of an automatic worm defense system.
 Honeypots detect 0day attacks.
 Generates and delivers vulnerability signatures to intrusion prevention
systems
Argos works by dynamic taint analysis of network data which is considered
untrusted.
 Taints data returned from QEMU emulated network driver.
Exploits detected when their is code redirection under attacker control.
 If EIP becomes tainted (under the control of the attacker)
 If EIP points to tainted data.
 Execve system calls checked for tainted arguments.
Dyanamic Taint Analysis Summary



Dynamic Taint Analysis is a technique used to track the flow of data.
Important because it can be used as a general technique in more applied
topics.
Has applications including vulnerability detection and is used in places
like symbolic Execution.
Part iv)
Automated Unpacking
Packers




A packer rewrites an executable, wrapping a new layer of code around the
original program.
 Essentially becomes an executable inside an executable.
A packer is used to compress, obfuscate or encrypt the original executable
 Today almost all malware is packed.
 Packers originally used for compression
 I remember packers (or crunchers) from the early 90's, and had 2
floppy disks full of them, for the Commodore 64!
The resulting packed executable consists of a runtime unpacking layer and
a binary blob of the compressed or obfuscated original program.
At runtime, the unpacking layer, decompresses the blob writing to memory
the original executable. It then transfers execution back to the original
code.
 Not all packers follow this behavior. Some packers convert the
original executable to PCODE. At runtime the packed executable acts
as a VM.
Unpacking




Unpacking is the process of extracting the original executable from a
packed image.
The manual approach is to run the packed executable in a debugger,
skipping the unpacking stub which writes to memory the original image,
and breaking (in the debugger) when execution transfers to the now
unpacked image.
A dump of memory, but rebuild the image so its a valid executable again.
 Requires fixing the Import Address Table.
 ImpRec can do this.
Debugger scripts can automate the process on specific unpackers by
identifying instruction sequences that indicate which stage the unpacking
stub is in.
Automated Unpacking






Unpacking can be automated.
Run packed executable.
Track all memory writes by executable.
If execution transfers to a priorly written to memory location, then
unpacking deemed to have occurred.
May be necessary to repeat as multiple layers may exist.
Public automated unpackers available from Offensive Computing, and
also Pandora's Bochs.
Automated Unpacking
Implementation Approaches

Multiple approaches in implementation
 Use hardware page protection in OS to track writes and execution. Eg
Offensive Computing. This results in high performance.
 If running inside a virtualized environment like VMWare, VM
might be detected. Offensive Computing recommend using a real
goat machine.
 Dynamic Instrumentation or complete emulation of packed program
to track memory writes and execution.
 Offensive Computing use instrumentation approach with Intel PIN
framework.
 Pandoras Bochs uses the Bochs emulator.
Automated Unpacking using an
Emulator




Emulation is a mature closed source technology used by AntiVirus
 Original usage of emulation was to detect polymorphic virus, but now
used for unpacking also.
Typical AntiVirus emulator emulates both the instruction set and parts of the
operating system.
 This is how I wrote my own automated unpacker and emulator.
 There are no software licensing problems since the emulator is only a
regular piece of software.
Another approach is to use a whole system emulator such as Bochs or
QEMU running an installed OS.
Non emulated approaches are more likely to be detected or be suspect to antidebugging tricks employed by malware.
Using an AV style Emulator as a
CPU checker






While developing my AV style emulator, a need arose to verify the
emulation.
I Implemented a program tracer to trace programs in parallel to emulation
 Tracer needed to automatically evade anti-debugging tricks
 Instructions needed to be emulated that would indicate the
program was being debugged. (eg, EFlags popf, rdtsc, or software
int1 being confused with single stepping)
 Library calls also (eg, Process32* which shows debuger in process
list, and IsDebuggerPresent)
For each traced instruction, the emulator executes the same instruction.
The CPU state from the tracer is verified against the state of the emulator,
and checked for consistency.
Some instructions produced differences between emulation and tracing,
not due to a fault of the emulator or tracer.
CPU Bugs. Some Instructions not following Intel specifications.
 Not setting/clearing processor status flags
Automated Unpacking using an
Emulator implementation





Changes to an emulator required involve modifying the software MMU to
track memory writes, and checking each instruction to see if the EIP
matches any addresses where memory writes have occurred.
Similar problems as TTAnalyze are present in determining what code is
part of the target process.
The Renovo unpacker from the BitBlaze project follows the TTAnalyze
approach in starting the executable in a suspended state, and then using a
kernel driver in the guest to find the page directory base address of the
process.
Pandora's Bochs uses an unmodified guest system and instead watches for
changes in the CR3 register to identify the target process.
To determine the value of CR3 it takes into account that in kernel mode
windows uses the fs register to reference a known structure leading to the
EPROCESS list which like TTAnalyze, contains the page directory base
address (CR3) of each process.
Attacks against Automated
Unpackers and Emulators


Malware might make use of unimplemented emulation of the architecture,
instruction set or operating system
 For AV emulators, use of obscure libraries.
 For whole system emulators, detection of the emulator. Malware
might check existence of known CPU errata.
Having malware require activation (eg, using the Internet), or only
occasionally activating.
Attacks (cont): Virtual Machine
Packers






Packer translates executable into PCODE.
At runtime, PCODE is decoded and executed in the style of a virtual
machine.
PCODE can be polymorphic.
This type of packer doesn't follow the 'write to memory then execute'
algorithm.
Eg, TheMida, but fortunately these packers are not as common in current
malware.
No automated method of unpacking against an unknown packer of this
type.
Automated Unpacking Summary



Automated unpacking works on a theory of intercepting execution on
priorly written to memory addresses.
Multiple approaches to implementation; emulation has some advantages.
Automated unpacking doesn't work on VM based unpackers.
Part v)
Using emulation to design and
implement symbolic execution
Symbolic Execution





A technique used to analyze programs.
For unknown input to a program, it maintain generalized information on
program state, systematically exploring program paths.
 Really a definition for mixed symbolic execution.
Execution occurs, by emulating instructions and using symbolic formula
instead of concrete data for user defined input.
 Example symbolic data can be network packet contents, program
arguments, file contents etc
Symbolic formula contain information on all program states on that
program path for arbitrary user input, that is, all the values the data can
possibly hold as held true by the symbolic formula.
Bug finding is equivalent to solving the equations.
 Eg, Is this pointer being dereference ever equal to 0, given arbitrary
user input.
 And if so, what is the user input that generates that bug.
SMT Based Constraint Solvers






Symbolic equations are generated for instructions that have symbolic
arguments.
Conditional instructions generate equations which are constraints (eg, x <
10)
Equations handled by Satisfiability over Modulo Theory (SMT) Solvers.
Efficient SMT based solvers are a relatively new achievement in the past
decade.
 Annual SMT competition pits solvers against each other.
 Microsoft has their own solver which is free to use, but not open
source.
 A number of open source solvers available.
SMT Solver can be queried, given a set of equations and constraints, to
see if certain queried constraints are true.
 Can easily determine if symbolic pointer is null..
SMT solvers can also generate concrete solutions from symbolic
equations
Applications of Symbolic
Execution



As a Bug checker
 Dawson Englers closed source C checker ExE which could detect
buffer overflows, null pointer dereferences and divisions by zero.
 The open source Catchconv – which doesn't explore program paths,
but checks assertions on a given set of input using symbolic execution
to find signedness bugs.
Intelligent fuzzing
 Symbolic Execution can automatically enumerate the paths and data in
a program that fuzzing normally misses, aiming towards complete
automated code coverage.
 Eg, closed source Microsoft Sage research
Tracing and evaluating the capabilities of binaries
 The closed source Bitblaze projects implements BitScope which is in a
similar vein to TTAnalyze except it symbolically explores the many
program paths in potential malware to find its capabilities.
Symbolic Execution
Implementation




Emulator runs program, instruction by instruction, generating symbolic
equations for instructions when a source operand is symbolic, such as the
symbolic equation ebx=eax + 10.
In an instruction, if a source operand is symbolic, destination becomes
symbolic.
 This is implemented using Dynamic Taint Analysis
At conditional instructions, two possible equations, the condition being
true, or the condition being false.
Symbolic Execution explores each path separately.
 A symbolic constraint representing the conditions truth is given to
each path, eg (x > 10 and x <= 10).
 Feasibility, that is if an equation can be satisfied as true, of each path is
determined by SMT solvers.
Symbolic Execution Challenges




Symbolic Execution may never terminate in the presence of loops, so
loops must be simplified, typically through unrolling.
 Symbolic Execution therefore is not complete.
Path Explosion: Dealing with functions like strcmp with symbolic input,
has many possible paths; an exponential number of paths for the size of
the string.
 BitBlaze approach: Hard code 'function summaries' to deal with
common library functions.
Dealing with symbolic pointers.
 Dynamic taint analysis has trouble determining the target memory that
becomes tainted if a pointer is symbolic.
 Requires SMT solver to determine concrete solutions of pointer.
SMT solver support used for target architecture may not be complete
 No public solvers support floating point.
Symbolic Execution Summary



Symbolic execution is a relatively new method to analyze programs.
Applications include bug checkers, smart fuzzers, and binary evaluation.
I believe symbolic execution has a big part in the future of automated
analysis.
Part vi)
Detecting Runtime Errors in
Programs
Valgrind





Valgrind is a heavyweight dynamic binary instrumentation framework.
 Most well known for the MemCheck checker.
 Memcheck used as a bug checker for incorrect heap use or access.
 Also detects uninitialized variable use.
Translates machine code to IR, then allows instrumentation, with modules
that implement runtime checkers.
Valgrind's Memcheck can detect out of bounds or invalid heap access and
tracks what addresses can be accessed by maintaining a 'shadow memory'
mirroring allocations on the heap.
For each address in shadow memory, also stores weather its initialized or
not.
Then checks all guest memory references belong to the shadow memory
using IR instrumentation.
Valgrind's MemCheck with
uninitialized variables




Uninitialized variable checker implemented using dynamic taint analysis.
 Newly allocated memory and new stack frames considered tainted.
 Initializing data untaints it.
Alert when using tainted/uninitialized data.
Naive implementation causes false positives.
 Memcpy of padded structures or memcpy of structures with
uninitialized members causes false positives.
Fixed by warning only when using uninitialized variables in system calls,
conditions or being dereferenced as a pointer.
Detecting Runtime Heap Errors in
the Linux Kernel




Tools that have similar designs or aims to detect some classes of heap
errors in the Linux Kernel.
KEFence (Linux) / MemGuard (FreeBSD)
 Detects overflows (and underflows for KEFence, but not both at the
same time) of heap buffers.
 Allocates a guard page next to the allocated buffer that page faults on
any access.
 Only detects overflows, not arbitary invalid access.
KmemCheck (Linux)
 Used to Detect uninitialized variable bugs.
 Maintains a shadow memory indicating state of data being initialized
or not.
 Page faults on all heap access, then checks shadow memory against
access.
UML + Valgrind
 Doesn't seem active, and source unavailable :(
Linux Kernel MemCheck










My own runtime checker that detects out of bounds heap access in the
Linux Kernel.
Not Valgrind's MemCheck – I named it poorly I know.
Tested under Linux 2.6.26 using a Windows Vista Cygwin host.
Implemented as a C++ fork of QEMU.
Dumps kernel stack trace on guest access violation
Only reports when a memory access violation occurs, much like Valgrind.
 Not a static analysis tool.
Host maintains 'shadow memory' of guest Linux Kernel heap that
identifies valid heap addresses.
The shadow memory is created by intercepting the heap management
functions in the Linux kernel and building a representation of the guest
heap.
MemCheck validates all memory access against this shadow memory (like
Valgrind).
Except in heap management functions like kmalloc, kfree etc.
Linux Kernel Heap Management


Linux has had several memory allocators, the latest Linux kernels now
using the “slub” allocator.
 MemCheck only supports the latest “slub” allocator.
There are also three internal allocators in Linux that use the heap.
 The Page Allocator, using the buddy allocator internally, which only
handles allocations of sizes being a predetermined multiple of the page
size.
 The page allocator can be called directly or indirectly from the
slub allocator.
 The Slub Allocator which handles allocations of varying sizes by
dividing up a “slab” that originates from the page allocator.
 The BootMem Allocator which uses a simpler algorithm than the other
allocators during boot time only.
Linux Kernel Heap Tracing and
Guest Linux Implementation



MemCheck must trace the kernel allocator functions to properly create its
shadow memory.
However tracing an unmodified Linux guest presents problems.
 The Page Allocator does not always return the address of the allocated
page contents, but returns a structure of the page description instead.
 The Slub Allocator defines kmalloc as an inline function which can't
be intercepted using a compile time symbol address.
 Following internal logic can be difficult, such as kmalloc using the
page allocator internally.
The solution is to use a modified guest Linux Kernel that uses
instrumentation of the allocators that MemCheck can easily intercept
MemCheck QEMU implementation

QEMU was modified to implement MemCheck.
 MemCheck is written in C++ running in a Windows host, so I ported
QEMU 0.9.1 to compile under g++. In hindsight, porting was not
necessary and not worth the effort. I also backported some patches
that cause 0.9.1 to fail in windows.
 QEMU has an optimization of merging basic blocks in a translation
block. I needed basic block granularity to correctly intercept the
beginning of functions so this QEMU optimization was turned off.
 A tracer was implemented to track functions using a callback interface
on function entry or exit.
 By tracing the heap management code, a simple shadow memory was
constructed using C++ STL maps for the implementation.
 The software MMU in QEMU was modified to check the memory
access was a valid address in the shadow memory.
MemChecking the Linux Kernel





The Linux Test Project (LTP) contains 3000+ tests for the Linux Kernel
which exercise much of the core kernel code.
Ran the default test suite on Linux 2.6.26.3 using MemCheck.
 MemCheck is slow, but still allows for interactive sessions.
 Fedora Linux takes 30+ minutes to boot.
 Let the testsuite to run overnight
No out of bounds access detected.
Reran the testsuite again using slub debugging which in combination to
MemCheck, may result in more bugs being detected.
 Again, no out of bounds access detected.
While no immediate bugs were identified in 2.6.26.3, MemCheck may be
used against future kernel releases, possibly as part of an automated test
suite, or used to aid kernel debugging and development.
MemCheck Limitations



Because MemCheck is based on QEMU, very little hardware is emulated
so most of the Linux driver code is not tested.
Buffer overflows don't necessarily result in memory access using invalid
heap addresses.
 A slab based allocator fits heap allocations next to each other, so
buffers overflow into adjacent and valid heap allocations.
 A solution is to boot Linux using the slub_debug kernel option which
separates heap objects using a redzone.
If MemCheck generates a report from a vulnerable kernel module, only
kernel addresses are given in the stack trace no symbolic names are used.
MemCheck TODO




A solution to the adjacent buffer problem is to associate every heap access
with its original allocation by tracking heap pointers using dynamic taint
analysis.
 This use of dynamic taint analysis could also be applied in userland, as
a Valgrind checker.
Dynamic taint analysis can also be the basis of tracking uninitialized
variable usage without the false positives currently associated with
kmemcheck.
Dynamic taint analysis could also be used to implement garbage
collection, which could be used to identify memory leaks at the exact
location of each leak.
Symbol names for addresses in kernel modules!
MemCheck Packages


http://silvio.cesare.googlepages.com/ For the package
http://silviocesare.wordpress.com/ For commentary on some of
MemCheck's internals.
Runtime Error Detection Summary



Existing tools for runtime error detection include Valgrind which detects
userland heap bugs.
Tools for the kernel exist such as kmemcheck which detects uninitialized
variables.
MemCheck is a new tool to detect heap bugs in the Linux Kernel, and
operates similar to Valgrind.
That’s all folks…
A 2008 CQU Graduate looking for
interesting employment.
silvio.cesare@gmail.com
Download