Log Based Dynamic Binary Analysis for Detecting Device Driver Defects Olatunji Ruwase Thesis Proposal Thesis Committee: Todd C. Mowry (Chair) David Andersen Onur Mutlu Brad Chen (Google) Michael Swift (U. Wisconsin) Carnegie Mellon Device Drivers: The Good, The Bad, & The Ugly Good: Enable use of hardware devices Kernel module in commodity OS Distributed in binary form − Bad: Poor code quality [Chou01, Murphy04] Written by non kernel experts Detect bugs in production driver executions Poorly tested Ugly: Major cause of system failures System crashes OS corruption Application corruption Device damage Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 2 Program Monitoring Using Lifeguards program … eax = X edx = eax Y = edx + 1 jmp ecx … Lifeguard Lifeguards: dynamic correctness checking tools Dynamic binary analysis to work on unmodified binaries Instruction grained analysis to catch subtle bugs Versatility to catch broad range of bugs Memory [Nethercote07] Can Lifeguards be used to catch Driver Bugs ? Security [Newsome05, Castro05] Concurrency [Savage97, Yu05, Flanagan09] Multilingual program interface [Lee10] Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 3 Why Drivers Are Difficult To Write Correctly [Ryzhyk09_Dingo] • Concurrency issues • Reentrant interrupt handling User space SYSTEM CALL BOUNDARY •Network stack • Kernel resources • Hardware device • Generic C language issues • Memory management Kernel space • Interface issues Lifeguards effectively detect similar spectrum of issues in applications Upper layers of network stack Kernel resource mgmt Driver Synchronous: main memory & CPU registers Asynchronous: I/O memory & interrupts Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 4 Potential Uses of Driver Lifeguards Diagnosing system failures Test sites Customer sites Detecting “silent” faults Test sites Customer sites Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 5 Outline Motivation Overview of Lifeguard Deployment Thesis Question Related work Research Challenges Preliminary work Current and Future work Timeline Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 6 Lifeguard Deployment Approaches Dynamic Binary Instrumentation [PIN, VALGRIND] Fault isolation Imprecise checking of parallel execution Monitored program Memory Lifeguard Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 7 Lifeguard Deployment Approaches Dynamic Binary Instrumentation [PIN, VALGRIND] Fault isolation Imprecise checking of parallel execution Logging [AFTERSIGHT, LBA, SPECK] check_store (p) *p p = … NULL Multithreaded Monitored program program Monitor parallel execution [Pokam09,Vlachos10] Accelerate lifeguard execution[Chen08,Nightingale08,Ruwase08,Ruwase10 Execution trace ✘ Require fault containment Memory Log✘Based Lifeguards Lifeguard Protect Lifeguard are more promising for monitoring kernel mode drivers ✘ Restrict damages to faulting program Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 8 Thesis Questions Can Log Based Lifeguards precisely detect faults in the executions of device drivers ? Can Log Based monitoring be adapted for drivers ? Will the Lifeguards be efficient enough for production systems (Mobile, Desktop, Cloud) ? Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 9 Outline Motivation Overview of Lifeguard Deployment Thesis Question Related work Research Challenges Preliminary work Current and Future work Timeline Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 10 Eliminating Driver Faults During Development Avoid overheads of runtime fault detection or isolation ✖ Cannot find all faults in production drivers Static analysis [Metal, RacerX, SLAM] SYSCALL BOUNDARY ✖ Drivers are too complex Testing [DDT] ✖ Drivers have too many execution paths Synthesize driver code [Termite] ✖ Cannot synthesize complex features e.g. multithreading Lifeguards to detect other faults Customer sites Testing sites Upper layers of network stack Driver Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 11 Using Existing Hardware to Isolate Driver Faults Prevent system failures due to driver faults ✖ Little information on driver faults SYSCALL BOUNDARY Page table permissions User space drivers [Nooks] [Microdrivers, SUD] Upper layers of network stack Lifeguards on customer systems Driver Pinpoint fault location to aid debugging Detect “silent” driver faults Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 12 Checking Driver Execution to Isolate Faults Pinpoint fault location Detect “silent” faults Instrumented software checks [SafeDrive,XFI,BGI] Imprecise on parallel execution Only memory faults studied Logging works for parallel execution Lifeguards for high level faults Hardware breakpoints SYSCALL BOUNDARY Upper layers of network stack [DataCollider] Sampling approach misses real faults Lifeguard finds all faults in execution Driver Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 13 Related Work Summary Eliminating Driver faults during development Static analysis [Metal, RacerX, SLAM] Testing [DDT] Synthesizing driver code [Termite] Using existing hardware to isolate Driver faults Page table permissions [Nooks] User space drivers [Microdrivers, SUD] Checking Driver execution to isolate faults Instrumented software checks Hardware breakpoints [SafeDrive, XFI, BGI] [DataCollider] Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 14 Outline Motivation Overview of Lifeguard Deployment Thesis Question Related work Research Challenges Preliminary work Current and Future work Timeline Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 15 Research Challenges Preliminary work Adapting Log Based Monitoring for Drivers Understanding Device Drivers Current and Future work Detecting Common Driver Faults (Driver Lifeguards) Efficiency of Driver Lifeguards Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 16 Log Based Architectures (LBA) Program [Chen 08] Lifeguard Operating System Hardware Log Simulated LBA Design Execution logging Toggle when monitored thread (de)scheduled Fault containment Lifeguard as separate process Block program at system calls until Lifeguard catches up Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 17 Adapting Execution Logging for Driver Monitoring Toggle point Difficulty Complete information for precise fault detection Efficient Modest storage and bandwidth costs No lifeguard filtering costs SYSTEM CALL BOUNDARY Network stack Upper layers of network stack Driver Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 18 Adapting Execution Logging for Driver Monitoring Option Toggle Kernel Ring change Complete Efficient ✔ ✗ SYSTEM CALL BOUNDARY I/O stack Driver Network stack DIIFICULTY [AFTERSIGHT] Upper layers of network stack Driver Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 19 Option Toggle Kernel Ring change ✔ ✗ I/O stack I/O syscall ✔ ✗ Driver Complete Efficient SYSTEM CALL BOUNDARY Network stack DIIFICULTY Adapting Execution Logging for Driver Monitoring Upper layers of network stack Driver Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 20 Option Toggle Kernel Ring change ✔ ✗ I/O stack I/O syscall ✔ ✗ Driver ✔ ✔ Code region Complete Efficient SYSTEM CALL BOUNDARY Identify driver entry points at load time Network stack DIIFICULTY Adapting Execution Logging for Driver Monitoring Upper layers of network stack Driver Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 21 Adapting Fault Containment for Driver Monitoring Driver Lifeguard Operating System Hardware Log Execution logging Toggle when monitored thread (de)scheduled Fault containment Lifeguard as separate process Block program at system calls until Lifeguard catches up Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 22 Adapting Fault Containment for Driver Monitoring Driver Lifeguard OS OS Hardware Log Virtual Machine (VM) separation to protect Lifeguard [AFTERSIGHT] Rest of system remain vulnerable to driver faults Overhead of VM is high Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 23 Understanding Device Drivers SYSCALL BOUNDARY Upper layers of network stack PCI Driver Network Functions hard_start_xmit() irq_handler() open() stop() get_stats() ... PCI Bus Functions probe() remove() Required Functions module_init() module_cleanup() Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 24 Adapting Data Race Lifeguard for Network Drivers Thread 1 Lock (Mx) Write (X) Fork (Thread2) Unlock (Mx) Thread 2 Lock (Mx) Read (X) Unlock (Mx) Data race on X Two access on X where at least one access is a write No explicit synchronization between the accesses Lockset algorithm for detecting races in applications [Eraser] Shared data protected with consistent set of locks Happens-before relation for non-lock synch. (e.g fork) [RaceTrack] Lockset + kernel synch (interrupts, spinlocks) = KernelEraser Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 25 Network Driver Races Reported by KernelEraser Driver Serious Classification of Races Benign False Alarm Net stack synch. tg3 2* 15 tulip 0 0 Simulated LBA environment Total Device synch. 13 1533 1563 472 451 923 * Fixed in versions 2.6.18 & 2.6.21 Kernel version: Linux 2.6.17.1 Workload Drivers: tg3 & tulip • Load driver Driver class: Network • Enable Ethernet Bus: PCI • Transfer file over network Driver VM : 2 CPU • Disable Ethernet Lifeguard VM : 1 CPU • Unload driver Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 26 False Alarms due to Unobserved Invariants SYSCALL BOUNDARY Synchronizations due to device states Upper layers of network stack tg3 PCI Synchronizations in upper layers of I/O stack Lock(rtnl_lock); driver->open(); Unlock(rtnl_lock); … Lock(rtnl_lock); driver->stop(); Unlock(rtnl_lock) stop () { … while(tptg3_flags & …) … } open () { … tptg3_flags &= … … } Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 27 False Alarms due to Unobserved Invariants SYSCALL BOUNDARY Synchronizations due to device states Upper layers of network stack tg3 PCI Synchronizations in upper layers of I/O stack probe() inactive open() connected to pci bus ready for pkt rx/tx probe() { … tptg3_flags |= … … } open () { … tptg3_flags &= … … } Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 28 Preliminary Work Summary Adapted Log Based Monitoring for Drivers Identify driver code region to log only driver execution VM separation to protect Lifeguard Adapted Lockset (KernelEraser) to detect races in network drivers Found 2 known but serious data races in tg3 False alarms due to external synchronizations Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 29 Outline Motivation Overview of Lifeguard Deployment Thesis Question Related work Research Challenges Preliminary work Current and Future work Timeline Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 30 Eliminating False Alarms in KernelEraser + External synchronizations Network stack × Log network stack Emulate interface invariants SYSTEM CALL BOUNDARY Network stack Upper layers of network stack Driver stop () { Lock(rtnl_lock); … while(tptg3_flags & …) … Unlock(rtnl_lock); } open () { Lock(rtnl_lock); … tptg3_flags &= … … Unlock(rtnl_lock); } Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 31 Eliminating False Alarms in KernelEraser + External synchronizations Network stack × Log network stack Emulate interface invariants • Device Model finite state machine Driver inactive tg3 tulip open() Serious connected to pci bus 2* 0 Benign Network stack probe() SYSTEM CALL BOUNDARY Upper layers of network stack False Alarm Net stack synch. ready for pkt rx/tx 15 Driver 0 0 0 Device probe () { (INACTIVE) … tptg3_flags |= … … (CONNECTED TO BUS) } Total open () { (CONNECTED TO BUS) synch. … 0 tptg3_flags &= …17 … 0 0 (READY FOR TX/RX) } Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 32 Eliminating False Alarms in KernelEraser + External synchronizations Network stack × Log network stack Emulate interface invariants • Device Model finite state machine Network stack + Other driver classes • SCSI disk • SOUND • USB • GRAPHICS SYSTEM CALL BOUNDARY Upper layers of network stack Driver Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 33 Lifeguards for Common Driver Faults [Ryzhyk09_Dingo] User space • Concurrency faults • Data Races SYSTEM CALL BOUNDARY • Illegal memory access • Memory leaks • Uninitialized memory use • Interface violations • Device protocol • Kernel protocol • I/O stack protocol Kernel space • Memory faults Scalability ? Upper layers of network stack Kernel resource managers Network driver Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 34 Efficiency of Driver Lifeguards Accelerating Lifeguard analysis Static analysis Dynamic optimizations Parallel Lifeguards Hardware accelerators Reduce overhead of VM fault containment Hardware enforced fault isolation in same VM Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 35 Accelerating Driver Lifeguards Reduce analysis workload • Static analysis [XFI] Driver Lifeguard OS OS Hardware Log Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 36 Accelerating Driver Lifeguards Reduce analysis workload • Static analysis [XFI] Driver Lifeguard OS OS Hardware Log Run analysis faster • Dynamic compiler optimizations • Parallel Lifeguards • Hardware accelerators [Qin06,Ruwase10] [Nightingale08,Ruwase08] [Vlachos10] Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 37 Avoid Overhead of VM Fault Containment User space SYSTEM CALL BOUNDARY • Issues to consider Protection quality • Lifeguard using Driver (e.g. disk) • Kernel space Hardware enforced fault isolation [Nooks, SUD] Upper layers of network stack Kernel resource managers Network driver Lifeguard Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 38 Current and Future Work Summary Detecting common driver faults Data races Memory Interface violations Efficiency of Driver Lifeguards Accelerating Lifeguard analysis More efficient fault containment Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 39 Outline Motivation Overview of Lifeguard Deployment Thesis Question Related work Research Challenges Preliminary work Current and Future work Timeline Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 40 Timeline Logging Driver Execution Understanding Drivers Thesis Proposal Data Races Memory Faults OS protocol violations Device protocol violations Performance studies Thesis Writing Dec-05 Mar-06 Jun-06 Sep-06 Dec-06 Mar-07 Jun-07 Sep-07 Dec-07 Mar-08 Jun-08 Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 41 Questions ? Thanks to members of the LBA Group for their contributions Shimin Chen Babak Falsafi Phillip Gibbons Michelle Goodstein Michael Kozuch Onur Mutlu Todd Mowry Gennady Pekhimenko Vivek Seshadri Theodoros Strigkos Evangelos Vlachos Carnegie Mellon 7/17/2016 Log Based Dynamic Binary Analysis for Detecting Device Driver Defects 42