Log Based Dynamic Binary Analysis for Detecting Device Driver Defects Olatunji Ruwase

advertisement
Log Based Dynamic Binary Analysis for Detecting
Device Driver Defects
Olatunji Ruwase
Thesis Proposal
Thesis Committee:
Todd C. Mowry (Chair)
David Andersen
Onur Mutlu
Brad Chen (Google)
Michael Swift (U. Wisconsin)
Carnegie Mellon
Device Drivers: The Good, The Bad, & The Ugly
 Good: Enable use of hardware devices
 Kernel module in commodity OS
 Distributed in binary form
− Bad: Poor code quality [Chou01, Murphy04]
 Written by non kernel experts
Detect bugs in production driver executions
 Poorly tested
Ugly: Major cause of system failures
 System crashes
 OS corruption
 Application corruption
 Device damage
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
2
Program Monitoring Using Lifeguards
program
…
eax = X
edx = eax
Y = edx +
1
jmp ecx
…
Lifeguard
 Lifeguards: dynamic correctness checking tools
 Dynamic binary analysis to work on unmodified binaries
 Instruction grained analysis to catch subtle bugs
 Versatility to catch broad range of bugs




Memory [Nethercote07]
Can Lifeguards
be used to catch Driver Bugs ?
Security
[Newsome05, Castro05]
Concurrency [Savage97, Yu05, Flanagan09]
Multilingual program interface [Lee10]
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
3
Why Drivers Are Difficult To Write Correctly
[Ryzhyk09_Dingo]
• Concurrency issues
• Reentrant interrupt handling
User
space
SYSTEM CALL BOUNDARY
•Network stack
• Kernel resources
• Hardware device
• Generic C language issues
• Memory management
Kernel space
• Interface issues
Lifeguards effectively detect similar
spectrum of issues in applications
Upper
layers of
network
stack
Kernel
resource
mgmt
Driver
Synchronous:
main memory &
CPU registers
Asynchronous:
I/O memory &
interrupts
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
4
Potential Uses of Driver Lifeguards
 Diagnosing system failures
Test sites
 Customer sites

 Detecting “silent” faults

Test sites

Customer sites
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
5
Outline
 Motivation
 Overview of Lifeguard Deployment
 Thesis Question
 Related work
 Research Challenges
 Preliminary work
 Current and Future work
 Timeline
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
6
Lifeguard Deployment Approaches
 Dynamic Binary Instrumentation
[PIN, VALGRIND]

Fault isolation
Imprecise checking of parallel
execution
Monitored program
Memory
Lifeguard
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
7
Lifeguard Deployment Approaches
 Dynamic Binary Instrumentation
[PIN, VALGRIND]

Fault isolation
Imprecise checking of parallel
execution
 Logging [AFTERSIGHT, LBA, SPECK]
check_store
(p)
*p
p = …
NULL
Multithreaded
Monitored
program
program
 Monitor parallel execution
[Pokam09,Vlachos10]
 Accelerate lifeguard
execution[Chen08,Nightingale08,Ruwase08,Ruwase10 Execution trace
✘ Require fault containment
Memory
Log✘Based
Lifeguards
Lifeguard
Protect
Lifeguard are more promising for
monitoring kernel mode drivers
✘ Restrict damages to faulting program
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
8
Thesis Questions
 Can Log Based Lifeguards precisely detect faults in
the executions of device drivers ?


Can Log Based monitoring be adapted for drivers ?
Will the Lifeguards be efficient enough for production
systems (Mobile, Desktop, Cloud) ?
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
9
Outline
 Motivation
 Overview of Lifeguard Deployment
 Thesis Question
 Related work
 Research Challenges
 Preliminary work
 Current and Future work
 Timeline
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
10
Eliminating Driver Faults During Development
 Avoid overheads of runtime fault detection or isolation
✖ Cannot find all faults in production drivers
 Static analysis
[Metal, RacerX, SLAM]
SYSCALL BOUNDARY
✖ Drivers are too complex
 Testing [DDT]
✖ Drivers have too many execution paths
 Synthesize driver code [Termite]
✖ Cannot synthesize complex features
e.g. multithreading
 Lifeguards to detect other faults
 Customer sites
 Testing sites
Upper
layers of
network
stack
Driver
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
11
Using Existing Hardware to Isolate Driver Faults
 Prevent system failures due to driver faults
✖ Little information on driver faults
SYSCALL BOUNDARY
 Page table permissions

User space drivers
[Nooks]
[Microdrivers, SUD]
Upper
layers of
network
stack
 Lifeguards on customer systems
Driver
 Pinpoint fault location to aid debugging
 Detect “silent” driver faults
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
12
Checking Driver Execution to Isolate Faults
 Pinpoint fault location
 Detect “silent” faults
 Instrumented software checks
[SafeDrive,XFI,BGI]
Imprecise on parallel execution
Only memory faults studied
 Logging works for parallel execution
 Lifeguards for high level faults
 Hardware breakpoints
SYSCALL BOUNDARY
Upper
layers of
network
stack
[DataCollider]
Sampling approach misses real faults
 Lifeguard finds all faults in execution
Driver
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
13
Related Work Summary

Eliminating Driver faults during development
 Static analysis [Metal, RacerX, SLAM]
 Testing [DDT]
 Synthesizing driver code [Termite]
 Using existing hardware to isolate Driver faults
Page table permissions [Nooks]
 User space drivers [Microdrivers, SUD]

 Checking Driver execution to isolate faults

Instrumented software checks

Hardware breakpoints
[SafeDrive, XFI, BGI]
[DataCollider]
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
14
Outline
 Motivation
 Overview of Lifeguard Deployment
 Thesis Question
 Related work
 Research Challenges
 Preliminary work
 Current and Future work
 Timeline
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
15
Research Challenges
 Preliminary work
Adapting Log Based Monitoring for Drivers
 Understanding Device Drivers

 Current and Future work
Detecting Common Driver Faults (Driver Lifeguards)
 Efficiency of Driver Lifeguards

Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
16
Log Based Architectures (LBA)
Program
[Chen 08]
Lifeguard
Operating System
Hardware Log
Simulated LBA Design
 Execution logging

Toggle when monitored thread (de)scheduled
 Fault containment
Lifeguard as separate process
 Block program at system calls until Lifeguard catches up

Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
17
Adapting Execution Logging for Driver Monitoring
 Toggle point
Difficulty
 Complete information for precise
fault detection
 Efficient
Modest storage and bandwidth costs
 No lifeguard filtering costs

SYSTEM CALL BOUNDARY
Network stack

Upper
layers of
network
stack
Driver
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
18
Adapting Execution Logging for Driver Monitoring
Option
Toggle
Kernel
Ring change
Complete Efficient
✔
✗
SYSTEM CALL BOUNDARY
I/O stack
Driver
Network stack
DIIFICULTY
[AFTERSIGHT]
Upper
layers of
network
stack
Driver
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
19
Option
Toggle
Kernel
Ring change
✔
✗
I/O stack I/O syscall
✔
✗
Driver
Complete Efficient
SYSTEM CALL BOUNDARY
Network stack
DIIFICULTY
Adapting Execution Logging for Driver Monitoring
Upper
layers of
network
stack
Driver
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
20
Option
Toggle
Kernel
Ring change
✔
✗
I/O stack I/O syscall
✔
✗
Driver
✔
✔
Code region
Complete Efficient
SYSTEM CALL BOUNDARY
Identify driver entry points at load time
Network stack
DIIFICULTY
Adapting Execution Logging for Driver Monitoring
Upper
layers of
network
stack
Driver
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
21
Adapting Fault Containment for Driver Monitoring
Driver
Lifeguard
Operating System
Hardware Log
 Execution logging

Toggle when monitored thread (de)scheduled
 Fault containment


Lifeguard as separate process
Block program at system calls until Lifeguard catches up
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
22
Adapting Fault Containment for Driver Monitoring
Driver
Lifeguard
OS
OS
Hardware Log
 Virtual Machine (VM) separation to protect Lifeguard [AFTERSIGHT]
Rest of system remain vulnerable to driver faults
Overhead of VM is high
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
23
Understanding Device Drivers
SYSCALL BOUNDARY
Upper
layers of
network
stack
PCI
Driver
Network Functions
hard_start_xmit()
irq_handler()
open()
stop()
get_stats()
...
PCI Bus Functions
probe()
remove()
Required Functions
module_init()
module_cleanup()
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
24
Adapting Data Race Lifeguard for Network Drivers
Thread 1
Lock (Mx)
Write (X)
Fork
(Thread2)
Unlock
(Mx)
Thread 2
Lock (Mx)
Read (X)
Unlock (Mx)

Data race on X
 Two access on X where at least one access is a write
 No explicit synchronization between the accesses

Lockset algorithm for detecting races in applications [Eraser]
 Shared data protected with consistent set of locks
 Happens-before relation for non-lock synch. (e.g fork) [RaceTrack]
Lockset + kernel synch (interrupts, spinlocks) = KernelEraser
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
25
Network Driver Races Reported by KernelEraser
Driver
Serious
Classification of Races
Benign
False Alarm
Net stack synch.
tg3
2*
15
tulip
0
0
Simulated LBA environment
Total
Device synch.
13
1533
1563
472
451
923
* Fixed in versions 2.6.18 & 2.6.21

Kernel version: Linux 2.6.17.1
Workload

Drivers: tg3 & tulip
•
Load driver

Driver class: Network
•
Enable Ethernet

Bus: PCI
•
Transfer file over network

Driver VM : 2 CPU
•
Disable Ethernet

Lifeguard VM : 1 CPU
•
Unload driver
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
26
False Alarms due to Unobserved Invariants
SYSCALL BOUNDARY

Synchronizations due
to device states
Upper
layers of
network
stack
tg3
PCI

Synchronizations in
upper layers of I/O
stack
Lock(rtnl_lock);
driver->open();
Unlock(rtnl_lock);
…
Lock(rtnl_lock);
driver->stop();
Unlock(rtnl_lock)
stop () {
…
while(tptg3_flags & …)
…
}
open () {
…
tptg3_flags &= …
…
}
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
27
False Alarms due to Unobserved Invariants
SYSCALL BOUNDARY

Synchronizations due
to device states
Upper
layers of
network
stack
tg3
PCI

Synchronizations in
upper layers of I/O
stack
probe()
inactive
open()
connected
to pci bus
ready for
pkt rx/tx
probe() {
…
tptg3_flags |= …
…
}
open () {
…
tptg3_flags &= …
…
}
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
28
Preliminary Work Summary

Adapted Log Based Monitoring for Drivers
 Identify driver code region to log only driver execution
 VM separation to protect Lifeguard

Adapted Lockset (KernelEraser) to detect races in network drivers
 Found 2 known but serious data races in tg3
 False alarms due to external synchronizations
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
29
Outline
 Motivation
 Overview of Lifeguard Deployment
 Thesis Question
 Related work
 Research Challenges
 Preliminary work
 Current and Future work
 Timeline
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
30
Eliminating False Alarms in KernelEraser
+ External synchronizations
Network stack
× Log network stack
 Emulate interface
invariants
SYSTEM CALL BOUNDARY
Network stack

Upper
layers of
network
stack
Driver
stop () {
Lock(rtnl_lock);
…
while(tptg3_flags & …)
…
Unlock(rtnl_lock);
}
open () {
Lock(rtnl_lock);
…
tptg3_flags &= …
…
Unlock(rtnl_lock);
}
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
31
Eliminating False Alarms in KernelEraser
+ External synchronizations
Network stack
× Log network stack
 Emulate interface
invariants
• Device
 Model finite state
machine

Driver
inactive
tg3
tulip
open()
Serious
connected
to pci bus
2*
0
Benign
Network stack
probe()
SYSTEM CALL BOUNDARY
Upper
layers of
network
stack
False Alarm
Net stack synch.
ready for
pkt rx/tx
15
Driver
0
0
0
Device
probe () {
(INACTIVE)
…
tptg3_flags |= …
…
(CONNECTED TO BUS)
}
Total
open () {
(CONNECTED TO BUS)
synch.
…
0
tptg3_flags
&= …17
…
0
0
(READY FOR TX/RX)
}
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
32
Eliminating False Alarms in KernelEraser
+ External synchronizations
Network stack
× Log network stack
 Emulate interface
invariants
• Device
 Model finite state
machine

Network stack
+ Other driver classes
• SCSI disk
• SOUND
• USB
• GRAPHICS
SYSTEM CALL BOUNDARY
Upper
layers of
network
stack
Driver
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
33
Lifeguards for Common Driver Faults
[Ryzhyk09_Dingo]
User
space
• Concurrency faults
• Data Races
SYSTEM CALL BOUNDARY
• Illegal memory access
• Memory leaks
• Uninitialized memory use
• Interface violations
• Device protocol
• Kernel protocol
• I/O stack protocol
Kernel space
• Memory faults
Scalability ?
Upper
layers of
network
stack
Kernel
resource
managers
Network driver
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
34
Efficiency of Driver Lifeguards

Accelerating Lifeguard analysis
 Static analysis
 Dynamic optimizations
 Parallel Lifeguards
 Hardware accelerators

Reduce overhead of VM fault containment
 Hardware enforced fault isolation in same VM
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
35
Accelerating Driver Lifeguards
 Reduce analysis workload
•
Static analysis [XFI]
Driver
Lifeguard
OS
OS
Hardware Log
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
36
Accelerating Driver Lifeguards
 Reduce analysis workload
•
Static analysis [XFI]
Driver
Lifeguard
OS
OS
Hardware Log

Run analysis faster
•
Dynamic compiler optimizations
•
Parallel Lifeguards
•
Hardware accelerators
[Qin06,Ruwase10]
[Nightingale08,Ruwase08]
[Vlachos10]
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
37
Avoid Overhead of VM Fault Containment
User
space
SYSTEM CALL BOUNDARY
• Issues to consider
Protection quality
• Lifeguard using Driver
(e.g. disk)
•
Kernel space
 Hardware enforced fault
isolation [Nooks, SUD]
Upper
layers of
network
stack
Kernel
resource
managers
Network driver
Lifeguard
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
38
Current and Future Work Summary

Detecting common driver faults
 Data races
 Memory
 Interface violations

Efficiency of Driver Lifeguards
 Accelerating Lifeguard analysis
 More efficient fault containment
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
39
Outline
 Motivation
 Overview of Lifeguard Deployment
 Thesis Question
 Related work
 Research Challenges
 Preliminary work
 Current and Future work
 Timeline
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
40
Timeline
Logging Driver Execution
Understanding Drivers
Thesis Proposal
Data Races
Memory Faults
OS protocol violations
Device protocol violations
Performance studies
Thesis Writing
Dec-05 Mar-06 Jun-06 Sep-06 Dec-06 Mar-07 Jun-07 Sep-07 Dec-07 Mar-08 Jun-08
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
41
Questions ?
Thanks to members of the LBA Group for their contributions

Shimin Chen

Babak Falsafi

Phillip Gibbons

Michelle Goodstein

Michael Kozuch

Onur Mutlu

Todd Mowry

Gennady Pekhimenko

Vivek Seshadri

Theodoros Strigkos

Evangelos Vlachos
Carnegie Mellon
7/17/2016
Log Based Dynamic Binary Analysis for Detecting Device Driver Defects
42
Download