Nelson Elhage Black Hat USA 2011 Introduction Related work Background Knowledge Attack Detailed CVE 2011-1751 Bug Detailed Exploit Detailed (Take Control of %rip) Inject Shellcode into host Disable non executable page Bypassing ASLR Conclusions Reference It was found that the PIIX4 Power Management emulation layer in qemu-kvm did not properly check for hot plug eligibility during device removals. A privileged guest user could use this flaw to crash the guest or, possibly, execute arbitrary code on the host. (CVE-2011-1751) a generic and open source machine emulator and virtualizer. Three components: Kvm.ko Kvm-intel.ko or kvm-amd.ko Qemu-kvm The core KVM kernel module Provides ioctls for communicating the kernel module Primarily responsible for emulating the virtual CPU and MMU Emulates a few devices in-kernel for efficiency Contains an emulator for a subset of x86 used in handling certain traps Provides support for Intel’s VMX and AMD’s SVM virtualization extensions Relatively small compared to the rest of KVM Provides the most direct user interface to KVM Based on the classic x86 emulator Implements the bulk of the virtual devices a VM uses Implements a wide variety of possible devices and buses An order of magnitude more code than the kernel module Static QEMUTimer *active_timers[QEMU_NUM_CLOCKS] Struct QEMUTimer { QEMUClock *clock; int64_t expire_time; QEMUTimerCB *cb; /* call back function*/ void *opaque; /* parameter */ struct QEMUTimer *next; /* link list */ } Active_timers QEMUTimer Related functions: Qemu_new_timer: allocate a memory region for the new timer. Qemu_mod_timer: modify the current timer add it to link list. Qemu_run_timers: loop through the link list and execute the timer structure call back function with the opaque as the parameter The main_loop_wait function will iterate through the active_timers and call qemu_run_timers() A computer clock that keep track of the current time MC146818 RTC hardware manual can be found http://wiki.qemu.org/File:MC146818AS.pdf RTCState structure Struct RTCState { ….. QEMUTimer *second_timer; QEMUTimer *second_timer2; } Related functions: Rtc_initfn : initialize the RTC Rtc_update_second : update the expire time of the QEMUTimer and add it to the link list. rtc_initfn : RTCState *s = …. s->second_timer = qemu_new_timer(rtc_clock, rtc_updated_second, s) s->second_timer2 = qemu_new_timer(rtc_clock, rtc_update_second2, s) qemu_mod_timer(s->second_timer2, s->next_second_time) Active_timer Rtc_update_second ……… Second_timer Second_timer2 Cb opaque Next Cb opaque Next QEMUTimer QEMUTimer ………. ……….. ………. ………. ……….. ………. RTCState Rtc_update_second2 A south bridge chip. Default south bridge chip used by qemu-kvm Include ACPI, PCI-ISA, and an embeded MC146818 RTC. Support PCI device hotplug, write values to IO port 0xae08 Qemu use qdev_free to emulate device hotplug. Certain devices don’t support device hotplug but qemu didn’t check this. It should not be possible to unplug the ISA bridge KVM’s emulated RTC is not designed to be unplugged. Did not check The device can Be unplug or not Being dealloc Add the second timer to link list. #include <sys/io.h> Int main(){ } iopl(3); outl(2, 0xae08); return 0; Unplug RTC Active_timer Cb opaque Next QEMUTimer Rtc_update_second ……… Second_timer RTCState Cb opaque Next ………. ……….. ………. QEMUTimer Unplug RTC Active_timer Cb opaque Next Second_timer RTCState Cb opaque Next …… …… …… QEMUTimer Rtc_update_second ………. ……….. ………. QEMUTimer Dummy memory region Return to main_loop_wait Call qemu_run_timers Active_timer Cb opaque Next Second_timer RTCState Cb opaque Next …… …… …… QEMUTimer Rtc_update_second ………. ……….. ………. QEMUTimer Dummy memory region QEMUTimer call back Rtc_update_second(opaque) Active_timer Cb opaque Next Second_timer RTCState Cb opaque Next …… …… …… QEMUTimer Rtc_update_second ………. ……….. ………. QEMUTimer Dummy memory region Next Main_loop_wait Active_timer Cb opaque Next Second_timer RTCState Cb opaque Next …… …… …… QEMUTimer Rtc_update_second ………. ……….. ………. QEMUTimer Dummy memory region 1. Inject a Controlled QEMUTimer into qemu-kvm 2. Eject ISA bridge 3. Force an allocation into the freed RTCState, with second timer point to our fake QEMUTimer The guest RAM is backed by mmap()ed region inside the qemu-kvm process. Allocate in the guest RAM and calculate the the host address by the following formula: Hva = physmem_base + gpa gpa = page_traslation(gva) <= linux kernel project 1 Gva = guest virtual address Gpa = guest physical address Hva = host virtual address Physmem_base = mmap start region For now assume we know physmem_base(no aslr) Force qemu to call malloc Utilize the qemu-kvm user-mode networking stack Qemu-kvm implement DHCP server, DNS server and NAT gateway in user-mode networking stack User-mode stack normally handle packets synchronously To prevent recursion, if a second packet is emitted while handling a first packet, the second packet is queued using malloc. ICMP ping. 1. Allocate a Fake QEMUTimer 2. calculate the Fake timer address 3. unplug ISA bridge 4. ping the gateway containing pointers to your fake timer. Allocate Fake QMEUTimer Active_timer Cb opaque Next QEMUTimer Rtc_update_secon ………. ……….. ………. ……… Second_timer RTCState Cb opaque Next Cb opaque Next QEMUTimer Fake QEMUTimer Evil function (Shellcode) ………. ……….. ………. Unplug ISA bridge Ping the gateway Active_timer Cb opaque Next Second_timer RTCState Cb opaque Next Cb opaque Next QEMUTimer Rtc_update_secon ………. ……….. ………. QEMUTimer Fake QEMUTimer Evil function (Shellcode) ………. ……….. ………. First Main_loop_wait Active_timer Cb opaque Next Second_timer RTCState Cb opaque Next Cb opaque Next QEMUTimer Rtc_update_secon ………. ……….. ………. QEMUTimer Fake QEMUTimer Evil function (Shellcode) ………. ……….. ………. Second Main_loop_wait Active_timer Cb opaque Next QEMUTimer Second_timer RTCState Evil function (Shellcode) Cb opaque Next Fake QEMUTimer ………. ……….. ………. 1. we have %rip control 2. Where is the Evil function Inject shellcode to host virtual memory Host virtual memory has page protection(NX bit) 3. Solutions: A. ROP B. something clever 1. we can control the QEMUTimer data structure. 2. create multiple QEMUTimer object and chain them together. QEMUTimer Cb opaque Next Cb opaque Next Cb opaque Next ………. ……….. ………. ………. ……….. ………. ………. ……….. ………. F1(X) F2(Y) F3(Z) We now have multiple on argument function calls. We want to do more arguments function calls. For example, mprotect take three arguments. Arguments of types Bool, char, short, int, long, long long, and pointers are in the INTEGER class. If the class is INTEGER, the next available register of the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used More detailed check out the reference 7 Suppose we can find a function with the following property. Set_rsi: movl %rdi, %rsi; return Let f1(x) be set_rsi %rsi register will not be modified during qemu_run_timer() in most qemu version. Therefore, F2(y) becomes F2(y,x) since we control the %rsi from f1(x) Void cpu_outl(pio_addr_t addr, uint32_t val) { ioport_write(2, addr, val); } This function will copy its first parameter to the second parameter of ioport_write %rdi is the first parameter and %rsi is the second parameter. Therefore we get a function with the previous property. (Movl %rdi, %rsi) Mprotect prototype: Mprotect(addr, lens, prot) PROT_EXEC = 4 Use the following function We control the “opaque/ioport” by QEMUTimer and control the “addr” by set_rsi() Seems like we control everything in this function Allocate a fake IORangeOps with fake_ops->read = mprotect Allocate a page-aligned IORange with Fake_ioport->ops = fake_ops Fake_ioport->base = -PAGE_SIZE Copy shellcode following the IORange Construct a timer chain that calls Cpu_outl(0, *) Ioport_readl_thunk(fake_ioport, 0) Fake_ioport + 1 mprotect QEMUTimer Chain Cb opaque Next Cb opaque Next ……. ……. ……. Cb opaque Next ops ……. ……. ……. ……. ……. ……. Read Fill with shellcode IORangeOps Cpu_outl Ioport_readl_thunk IORange (PAGE_ALIGN) The base address of the qemu-kvm binary, to find code address(such as mprotect ….) Physmem_base, the address of the physical memory mapping inside kvm Solutions: Find an information leak Assume non-PIE. Every major distribution compile qemu- kvm as non position independent executable. How about physmem_base Emulated IO ports 0x510 (address) and 0x511 (data) Used to communicate various tables to the qemu BIOS (e820 map, ACPI tables, etc) Also provides support for exporting writable tables to the BIOS However, fw_cfg_write doesn’t check if the target table is supposed to be writable Several fw_cfg areas are backed by statically-allocated buffers. Net result: nearly 500 writable bytes inside static variables. Mprotect needs a page-aligned address, so these aren’t suitable for our shellcode We can construct fake timer chains in this space to build a read4() primitive. (Create Information Leak) Follow pointers from static variables to find physmem_base Proceed as before Sandbox qemu-kvm Build qemu-kvm as PIE Lazily mmap/mprotect guest RAM XOR-encode key function pointers More auditing and fuzzing of qemu-kvm VM breakouts aren’t magic Hypervisors are just as vulnerable as anything else Device drivers are the weak spot. [1] http://qemu.weilnetz.de/qemu-tech.html [2] http://qemu.weilnetz.de/doxygen/structRTCState.html [3] http://www.linuxinsight.com/files/kvm_whitepaper.pdf [4] https://www.ibm.com/developerworks/cn/linux/l-virtio/ [5] http://smilejay.com/kvm_theory_practice/ [6] http://www.linux-kvm.org/page/Documents [7] http://www.cs.tufts.edu/comp/40/readings/amd64-abi.pdf [8]http://linuxfromscratch.xtra-net.org/hlfs/view/unstable/glibc- 2.4/chapter02/pie.html qemu source code