Software fault isolation with API integrity and multi-principal modules Yandong Mao, Haogang Chen (MIT CSAIL), Dong Zhou (Tsinghua University IIIS), Xi Wang, Nickolai Zeldovich, Frans Kaashoek (MIT CSAIL) Kernel security is important • Kernel is fully privileged • Kernel compromises are devastating • Remote attacker takes control over the whole machine • Local user gains root privilege Linux kernel is vulnerable • Vulnerabilities in Linux are routinely discovered • CVE 2010: 145 vulnerabilities in Linux kernel • Many exploits attack kernel modules • 67% of Linux kernel vulnerabilities (CVE 2010) • This talk focuses on vulnerabilities in kernel modules Threat • Module programmer makes mistake • Attacker exploits mistake to mount attacks • Example: buffer overflow, set current UID to root Module Privilege escalation! Kernel memory Module memory UID One approach: type safe languages • Write kernel and modules in Java, C# • No reference to UID object => cannot directly change UID • Attacker cannot synthesize references Module Most kernels are not written in type safe language! UID Software Fault Isolation (SFI[SOSP93]) Can not bypass SFI check Module char *p = 0xf7; sfi_check_memory(p); *p = 0; Module memory UID SFI Runtime void sfi_check_memory(p) { if p not in “Module memory” stop_module(); } Memory safety is insufficient for stopping attacks! • Challenge: module needs to call kernel functions Core Kernel void spin_lock_init(spinlock_t *lock) { lock->v = 0; } Module memory UID Spin_module spinlock_t mylock; spin_lock_init(&mylock); Problem: API abuse • Attacker tricks fully-privileged kernel code to overwrite UID Core Kernel Spin_module void spin_lock_init(spinlock_t *lock) { lock->v = 0; } spin_lock_init(&cur_proc->uid); Privilege escalation! Module memory UID Challenge: lack of API integrity • Kernel APIs are not written defensively • Assume the calling module to obey implicit rules • Do not check arguments, permissions, etc • Problem: modules cannot be trusted to follow rules • Module can trick kernel into performing unexpected actions • Ideal system would enforce rules for kernel API • Analogy: system call code assumes nothing about caller, checks every assumption State of the art for protecting APIs • SFI[SOSP93]: memory safety • XFI[OSDI06]: no argument checks • BGI[SOSP09]: manually wrap functions, make kernel defensive when kernel code invokes callbacks • Error-prone and time-consuming • Works if kernel code is well-structured (not Linux) Our approach: annotation language • Helps enforce two types of API integrity: • Argument integrity: programmer controls what arguments a module can pass to functions • Callback integrity: kernel invokes callback only if the module could have invoked callback directly • Allows programmers to specify principals for privilege separation within a module • Less error-prone than manual wrapping, applicable to complex APIs such as those in Linux Contributions • LXFI: software fault isolation system for Linux kernel modules • Annotation language for • Argument integrity • Callback integrity • Privilege separation within a module • Evaluation • Few annotations for 10 Linux kernel modules • Stop three real exploits • 2-4X CPU overhead for netperf Goals for annotation language • Enforce argument integrity, callback integrity and privilege separation within a module • Minimize programmer effort, e.g.: • Few annotations • Avoid data structure and API changes • Compatible with C Preventing module exploits Programmer annotates core kernel Compile time Runtime LXFI translates annotations to runtime checks LXFI performs checks If annotations capture all implicit rules, compromised module cannot violate rules to gain additional privileges. Using compiler plugins; Provide safe default: reject a module if it calls an unannotated API Consulting a dynamic table of capabilities for each module Design of annotation language • Argument integrity annotations • Using the spin_lock_init example • Callback integrity annotations • Not discussed; see paper • Privilege separation annotations • Using dm_crypt (real Linux kernel module) Enforce argument integrity • spin_lock_init: three annotations are required Part Syntax Capability write(ptr,size) Capability check(cap) Action Location pre(action) Description Write [ptr,ptr+size] Checks cap Perform action before function call Example: enforce argument integrity for spin_lock_init Core Kernel void spin_lock_init(spinlock_t *lock) pre(check(write(lock, sizeof(spinlock_t))) Spin_module capability table LXFI Runtime write(mylock, 8) Module memory …… lxfi_check_write(mylock, 8); spin_lock_init(mylock) …… lxfi_check_write(&cur_proc->uid, 8); spin_lock_init(&cur_proc->uid) Privilege escalation prevented UID Where does the capability come from? • Granted on allocation • Two more annotations are required Part Capability Syntax Description write(ptr,size) Write [ptr,ptr+size] Capability Action check(cap) copy(cap) Check cap pre(action) Perform action before function call post(action) Perform action after function return Location Grant a copy of cap Example: grant spinlock Core Kernel Spin_module void *kmalloc(size) post(copy(write(return, size)) LXFI Runtime …… spinlock_t *mylock = kmalloc(8); lxfi_copy_write(mylock, 8); capability table write(mylock, 8) What happens when memory is freed? • Need to revoke capability to safely reuse memory • Strawman: revoke capability from caller • Insufficient! Other modules may have copies of capability Part Capability Syntax write(ptr,size) Write [ptr,ptr+size] check(cap) No other copies of Grant a copy of capthe capability remain Check cap transfer(cap) Revoke cap from all modules, and grant pre(action) Perform action before function call post(action) Perform action after function return copy(cap) Capability Action Location Description Example: safely free a spinlock Core Kernel Spin_module LXFI Runtime void kfree(void *p) pre(transfer(write(p, no_size))) lxfi_transfer_write(mylock, -1); …… kfree(mylock); capability table write(mylock, 8) other_module capability table write(mylock, 8) Why is spin_module able to call spin_lock_init, kmalloc, kfree? • Call capability • Granted initially according to the module’s symbol table • Trust module author not to call unnecessary functions • Dynamically granted when a callback function is passed Part Capability Capability Action Location Syntax Description write(ptr,size) call(a) Write [ptr,ptr+size] copy(cap) Grant a copy of cap check(cap) Check cap transfer(cap) Revoke cap from all modules, and grant pre(action) Perform action before function call post(action) Perform action after function return Call a Core Kernel void *kmalloc(size) post(copy(write(return, size)) void spin_lock_init(spinlock_t *lock) pre(check(write(lock, sizeof(spinlock_t))) void kfree(void *p) pre(transfer(write(p, no_size)) LXFI Runtime …… Spin_module capability table call(kmalloc) call(spin_lock_init) call(kfree) spinlock_t *mylock = kmalloc(8); lxfi_copy_write(mylock, 8); …… lxfi_check_write(mylock, 8); spin_lock_init(mylock)l …… lxfi_check_write(&cur_proc->uid, 8); spin_lock_init(&cur_proc->uid); …… lxfi_transfer_write(mylock, -1); kfree(mylock); No way for compromised spin_module to gain root privilege • SFI ensures memory safety • Call capabilities ensure only 3 functions are allowed • None of the functions can modify UID because: • kmalloc never modifies allocated memory • spin_lock_init can only be called with writable memory (from kmalloc) • kfree ensures no capabilities remain after free • spin_module can not modify UID! Privilege separation within a module • dm_crypt: transparent encryption service for block devices • This example requires a third type of capability Part Syntax Description write(ptr,size) Write [ptr,ptr+size] Capability call(a) Call a ref(a, t) Pass a as t copy(cap) Grant a copy of cap Pass argument a as type t Capability check(cap) Check cap Action transfer(cap) Revoke cap from all principals, and grant Location pre(action) Perform action before function call post(action) Perform action after function return Privilege separation User space write(“/etc/secret.txt”, “foo”) Kernel space int bdev_write(block_device *dev, const char * data, …) pre(check(ref(block_device), dev) Core Kernel write(enc_disk, “foo”, …) dm_crypt capability table LXFI Runtime ref(block_device, enc_disk->bdev) Writing block device does not require writing to memory of enc_disk->bdev. lxfi_check_ref(block_device, enc_disk->bdev) bdev_write(enc_disk->bdev, E(“foo”), …) Privilege separation read(…) User space Kernel space int bdev_write(block_device *dev, const char * data, …) pre(check(ref(block_device), dev) Core Kernel LXFI Runtime dm_crypt capability table capability table ref(block_device, enc_disk->bdev) ref(block_device, enc_usb->bdev) ref(block_device, enc_usb->bdev) Decrypt lxfi_check_ref(block_device, enc_disk->bdev) bdev_write(enc_disk->bdev, “/etc/pwd”, “foo”) /etc/pwd: rootpwd=foo How to define principals • Associate a principal with every instance a module supports (e.g. block device in dm_crypt) • Problem: how to specify and name principals? • Recall goal: minimize changes to existing data structures • Idea: re-use address of data structure as the name of the principal • Can typically identify principal from one of the function arguments Specifying principals Part Syntax Description write(ptr,size) Write [ptr,ptr+size] Capability ref(a, t) Pass a as t call(a) Call a copy(cap) Grant a copy of cap Capability check(cap) Check cap Action transfer(cap) Revoke cap from all principals, and grant Location Principal pre(action) Perform action before function call post(action) Perform action after function return Run with privileges of principal ptr principal(ptr) Privilege separation User space Kernel space struct dm_type { int (*map)(struct dm_target *di); principal(di) }; Core Kernel lxfi_set_princ(enc_usb) dm_crypt.map(enc_usb) LXFI Runtime dm_crypt capability table capability table write(enc_disk->bdev, 100) write(enc_usb->bdev, 100) Decrypt lxfi_check_write(enc_disk->bdev, 100) bdev_write(enc_disk->bdev, “/etc/pwd”, “foo”) /etc/pwd: rootpwd=foo Principal name aliasing • Problem: Kernel identifies a LXFI principal by multiple addresses int e1000_probe(struct pci_dev *pcidev) { struct net_device *ndev = alloc_etherdev(...); ndev->pcidev = pcidev; lxfi_princ_alias(pcidev, ndev); ... } int e1000_xmit(struct net_device *dev) { … } • Insert code into module to create alias • The same principal now has multiple names Other annotation language features Part Capability Syntax Description Save annotation effort for write(ptr,size) complex objects that needWrite [ptr,ptr+size] multiplet)capabilities ref(a, Pass a as t call(a) Call a cap_iterator(obj) A function iterates all cap. of obj copy(cap) Grant a copy of cap if(c-expr) action Perform action only if c-expr Capability Action check(cap) Check capGlobal:principal with full Express conditional action such as grant atransfer(cap) privilege if return value is OK Revoke cap privilige from all principals, grant cap Shared:principal with minimalbefore privilege pre(action) Perform action function call Location post(action) Perform action after function return Principal principal(ptr) Run with privileges of principal ptr(global, shared) Implementation • Linux 2.6.36, x64, single-core • gcc plugin: kernel rewriting for callback integrity • Clang/LLVM plugin: module rewriting • Annotation propagation saves effort by inferring annotations of module functions Example: annotation propagation //linux/drivers/net/e1000/e1000_main.c //from linux/include/pci_driver.h struct pci_driver { int (*probe)(struct pci_dev *pcidev) principal(pcidev) pre(copy(ref(struct pci_dev), pcidev) } LXFI propagates annotation on probe to modules int e1000_probe(struct pci_dev *pcidev) { …. } struct pci_driver e1000_driver = { .probe = e1000_probe }; //linux/drivers/net/ixgbe/ixgbe_main.c int ixgbe_probe(struct pci_dev *pcidev) { …. } struct pci_driver ixgbe_driver = { .probe = ixgbe_probe }; Evaluation • Security • Annotation effort • Performance overhead Security • Test LXFI with three real privilege escalation exploits Exploit CAN_BCM CVE ID CVE-2010-2959 Violated Property Unmodified Linux LXFI Memory Safety CVE-2010-3849 Econet CVE-2010-3850 API Integrity CVE-2010-4258 RDS CVE-2010-3904 API Integrity • Stopping real attacks requires API integrity Annotation effort • Annotate kernel APIs for 10 modules, one at a time • Count: • # of annotated core kernel functions a module calls • # of function pointer declarations a module exports to core kernel Sharing reduces annotation effort Category Module net device driver sound device driver net protocol driver block device driver Total #Functions # Function Pointers All Unique All Unique e1000 81 49 52 47 snd-intel8x0 59 27 12 2 snd-ens1370 48 13 12 2 rds 77 30 42 26 can 53 7 7 3 can-bcm 51 15 17 1 econet 54 15 20 3 dm-crypt 50 24 24 14 dm-zero 6 3 2 0 dm-snapshot 55 16 28 18 334 155 LXFI performance • netperf, 1 Gigabit e1000 network card, LAN • Stresses LXFI Test Throughput CPU % Stock LXFI Stock LXFI TCP_STREAM TX 836 M bits/sec 828 M bits/sec 13% 48% UDP_STREAM TX 3.1 M/3.1 M pkt/sec 2.0 M/2.0 M pkt/sec 54% 100% ~30% decrease CPU time of LXFI actions for netperf 80% • Room for improvement Capability action Mem-write check Function Entry Function Exit Indirect call check Future work • Improve performance • Faster capability management such as BGI’s • Extend annotation language to enforce other types of API integrity • Perhaps based on Singularity’s contracts Related work • Type-safe kernels: Singularity [MSR-TR05] • LXFI provides similar guarantees in C • Good support for revocation (transfer) and principals • Software fault isolation • LXFI extends existing SFI systems (SFI, XFI, BGI) with annotation language Conclusion • Extend SFI with annotation language for: • Argument integrity • Callback integrity • Principals • LXFI: Prototype for Linux • Annotated 10 kernel modules • Prevented 3 real privilege escalation exploits • 2-4X CPU overhead when stressing with netperf Q&A