Windows Kernel Vulnerability Research and Exploitation Gilad Bakas Presentation Overview • • • • • • • • • • • • Why Kernel? What’s Different? Technical Background Vulnerability Research Common and less common kernel bugs Exploit Development Examples: Use-after-free, DRM Tips & Tricks AFD.SYS: A simple kernel bug Win32k.sys: A complex kernel bug Windows 8 and the future Questions Why Kernel? • Used to be much harder • With the introduction of DEP, ASLR, UAC, Heap checks, Protected Mode, Sandboxes etc in User Mode, it’s now on par and sometimes even easier • In parallel to the securing of User Mode, a lot of OS functionality was moved from User to Kernel, and new User-to-Kernel interfaces were introduced, thus drastically increasing the attack surface in the Kernel Why Kernel Cont’d • In 64bit systems, the Driver Signing Requirements prevent even an Administrator from running unsigned Kernel Code, making exploitation the only alternative. • Many times, the payload uses a driver anyway, so it’s easier to just start from the Kernel This is already happening • Quoted from a November 4 article by Gregg Keizer’s on ComputerWorld: “Microsoft has been extremely busy patching pieces of the Windows kernel this year. So far during 2011, Microsoft has patched 56 different kernel vulnerabilities with updates issued in February, April, June, July, August and October. In April alone, the company fixed 30 bugs, then quashed 15 more in July.” What’s different? • If something goes wrong, it goes REALLY wrong. That means that even the smallest glitch leads to a BSOD and a system reboot. • No need to worry about permissions • You have to master a lot more technical knowledge. • No process boundaries. This means that you have a lot more to play with, but also a lot more to mess with Required Technical Background Things you have to master before you even begin • • • • Kernel APIs Memory Layout Interrupts, IRQLs, DPCs, IRPs Synchronization: Events, Spinlocks, Mutexes, Timers, Semaphores, Resources • Paging mechanism • Intel System Architecture • Device Driver structure, MJ functions, IOCTLs Vulnerability Research • • • • High-Level first Look for complexity Real challenge is figuring out where NOT to look Interfaces where different teams have to cooperate are more vulnerable – e.g. interaction between User and Kernel • Privilege Escalations are much easier than Remote • Multiple weak exploits can form one strong attack Vulnerability Research – Cont’d • Three approaches for finding vulnerabilities: 1. The High Level approach “Let’s first understand how this whole system works, and only then look for the holes.” 2. The Low Level approach “This function looks complex, let’s break it down to the bit and see if it has any bugs” 3. The blackbox / brute force / fuzzing approach “Let’s make this mothafucka crash by trying every possible input. We’ll worry about the details later” Vulnerability Research – Cont’d High Level approach process – Read everything you can possibly find about your target: white papers, help documents, bug reports, users forums etc. – Ask yourself “If I had to develop this code, how would I do it?” – Research to find all the possible ways to develop that functionality – Now that you know how it can be done, go to the code and find out which method it uses. You now have a high level overview of how your target works! – With the knowledge of how it works, think of possible weaknesses, then search for them in the code Vulnerability Research – Cont’d Low Level approach process – Identify logically and technically complex functions / operations in the code – Completely analyze and/or reverse engineer the relevant functions, looking for bugs – If a bug is found, figure out if there’s a way to generate an attack flow that will trigger it Vulnerability Research – Cont’d Blackbox / brute force / fuzzing approach process 1. Identify all the possible code inputs that are under your control 2. Find out the structure of the input fields, including calculated data like CRC, lengths, etc 3. Test different inputs by: 4. – Manually thinking of inputs that are likely to be mishandled – Write a script/program to generate inputs for you – Use a fuzzing script / program / infrastructure The goal is to get the best code coverage Common Bugs • Buffer overflows (stack and pool) • NULL dereference • Faulty input validation Less common bugs • Use-after-free • Direct calling to User code • Logical bugs Exploit Development • The more knowledge you have the better: – – – – – – Constant memory addresses Memory layouts Heaps, Pools, Stacks APIs, Objects CPU Assembly • Creativity • In kernel mode there are no process boundaries – we can use everything Example 1: Use-after-free • The bug: object is freed but still kept in a linked list of active objects • To exploit we needed to get our own data into the freed buffer *before someone else does* • The solution: – Use a bug in one driver to cause CPU starvation, thus reducing the chance of anyone else stealing our buffer – Use some DPC code in a second driver that allocates a buffer with the right size and copies our data into it – Activate the code in our target driver that uses the freed object, causing our shellcode to be run Example 2: DRM • This isn’t actually a kernel exploit, but it’s a great example of: – An insecure interface between User and Kernel code – How the interface points between different development teams are likely to be the weakest links • The system in this example is a DRM system that was meant to prevent movies from being copied by allowing playback on one machine only Example 2: DRM Cont’d • Every movie is encrypted • Decryption code is embedded in the movie • The code is different in each movie • License is given per-computer, based on hardware signature • Accessing the hardware requires Kernel code, so the decryption code inside the movie calls a small driver, that calls straight back into the user code Ring0 (Kernel-Mode) PCCP Device Driver Ring3 (User-Mode) Movie Movie Decryption Code Example 2: DRM Broken • Hook DeviceIoControl • Instead of calling the driver, we call the user-mode code within a try…catch statement • Any attempt to do something nasty like reading the BIOS and accessing hardware will generate an exception • We handle the exceptions by reading the data from a file instead of the BIOS/hardware • The decryption code is tricked to “see” the same hardware on all computers, so we can use the same license everywhere! Ring0 (Kernel-Mode) PCCP Device Driver Ring3 (User-Mode) Injected Code Movie Movie Decryption Code Tips & Tricks • Arbitrary write – good places to overwrite: Generic places with fixed addresses (per OS build): – Callback functions pointers – Data segment variables – GDT/LDT tables (http://j00ru.vexillium.org/?p=290) – Distpatch Table (used to be great, but it’s blocked on new OSs) Tips & Tricks – Cont’d Non-fixed addresses that can be extracted: • New technique (thanks to Gil Dabah and Tarjei Mandt): The Window Handle Table is mapped to user address space and contains Kernel pointers to objects with function pointers in them (see http://www.mista.nu/research/mandt-win32k-paper.pdf) • So it’s possible to: 1. Create a kernel window (a window for which win32k created and registered a window class so the window procedure is in kernel, such as menu and tooltip) 2. Get the pointer to the kernel window object from the Handle Table 3. Overwrite the WndProc Pointer 4. Send a message to the window to trigger the WndProc Pointer Tips & Tricks – Cont’d Non-fixed addresses that can be extracted: – Other Kernel pointers that are passed to user space. – Some Win32k.sys syscalls are defined as VOID or USHORT and leak a full or partial kernel pointers in the return value (see http://j00ru.vexillium.org/?p=762) Tips & Tricks – Cont’d • A BSOD is not the end: – There’s plenty of code that runs AFTER an exception, and many times that code calls callback functions that can be overwritten. Especially with ACCESS_VIOLATIONs, the flow goes to the pagefault handler first, so there are plenty of options for attack – Even inside KeBugCheck there are callbacks that can be overwritten – It’s a bit tricky to fix the context and resume normal execution afterwards – but it can be done. Tips & Tricks – Cont’d • WOW64 processes: – When running in a 32bit process on a 64bit system, when you try to call NtQuerySystemInformation, all the returned pointers are truncated to 32bit – Very annoying! – This can be overcome by using the built-in call gate to temporarily switch to 64bit, call NtQuerySystemInformation, then return to your 32bit code. For more information see http://vxheavens.com/lib/vrg02.html (and thanks to Mark Dowd for the tip!) – The 64bit TEB can be accessed directly without all the switching to 64bit mess since it’s mapped at gs:0 From Kernel to User • Many times, kernel exploits are required to install user-mode payloads or perform operations that require running user-mode code • Contrary to common logic that says the more power the better, launching user-mode code that runs with SYSTEM privileges from the Kernel can be very tricky (due to the lack of API and OS support to do so). • In the following slides we’ll go over the different techniques that can be used, and the pros and cons of each of them From Kernel to User – Changing the process token Method Change the token of a process we already have control of (e.g. the process that launched the exploit) to a SYSTEM token Pros – Easiest way to implement – Very reliable Cons – More noisy – The user-mode code has to do all the nasty work (e.g. injecting code to a system process), making it vulnerable to AV and security programs that hook user APIs From Kernel to User – User-mode APCs Method Queue a user-mode APC to a target thread already running in a system process. Pros – Gets you directly to where you want to be – Allows injection to any process on the system Cons – Only threads in Alertable state can be targeted, and there is no generic way to find them. An alternative is to force a thread into an Alertable state, but this breaks its waiting state, causing the wait function to return mid-way, and may lead to system instability or crash. – Very undocumented, and the relevant structures are different between OS versions. – Unless targeting a thread you have intimate knowledge of, this method may lead to deadlocks if the thread is holding some locks when it enters the wait state (e.g. the LoaderLock) From Kernel to User – Thread Hijacking Method Change the context of an existing thread in a system process to execute injected code. Pros – Gets you directly to where you want to be – Allows injection to any process on the system Cons – Restoring the context can be very difficult. – Hijacking an arbitrary thread is extremely dangerous and may cause deadlocks, instability, or crashes From Kernel to User – Creating a new thread Method Create a new user-mode thread in a system process Pros – An almost perfect solution, gets you exactly to where you want without any dangers or context issues. – Allows injection to any process on the system Cons – Extremely difficult to implement. In order for the new thread to function it has to be registered with CSRSS. The APIs and structures involved with that are complex, undocumented, and change constantly with Windows updates. From Kernel to User – API hooking Method Hook a user-mode API that you know is going to be called or that you can cause to be called within a system process Pros – Allows to inject directly into a system process – Very reliable Cons – Finding a suitable API to hook may be difficult. – This method isn’t generic, and will only work on system processes that frequently call the targeted API. An example of a simple Kernel Exploit – AFD.SYS • Let’s have a look at afd!AfdGetRemoteAddress Can someone see the problem? // Attacker controls OutputBuffer and OutputBufferLength void IOCTL_handler(...) { [...] try { ProbeForWrite (OutputBuffer, OutputBufferLength, sizeof (UCHAR)); RtlCopyMemory( OutputBuffer, (PUCHAR)context+endpoint>Common.VcConnecting.RemoteSocketAddressOffset, endpoint->Common.VcConnecting.RemoteSocketAddressLength ); } except( AFD_EXCEPTION_FILTER(&status) ) { Hint: [...] } } ProbeForWrite doesn’t throw an exception when length == 0 regardless of the actual pointer AFD.SYS - Continued • OK, so we can write data to any address we want, including kernel addresses, but we can’t really control what data! • The data written looks like this: 02 00 XX XX YY YY YY YY, where XXXX is the port and YYYYYYYY is the IP, and there has to be an active TCP connection for the function to work • What to do? AFD.SYS - Continued • Our options: – Overwriting a flag – Maybe we don’t need full control of the data? AFD.SYS - Continued • Solution: – We can connect to 127.0.0.1, that’s 7F 00 00 01. – Port 445 is always open on Windows machines, that’s 01 BD, so now we have 01 BD 7F 00 00 01 – We want to overwrite a 32bit pointer, and we need an address that we can easily allocate – How about: 01 BD 7F 00 00 01? Intel is Little Endian, so that gives us 0x00007FBD. Perfect! – Now we just need a pointer to overwrite. Since this is an old bug that only works on XP, we can just use the Dispatch Table. AFD.SYS - Exploit 1. Allocate page at 0x7fb0 and copy the shellcode into it. 2. HookAddress = Dispatch table entry for ZwQueryIntervalProfile 3. connect() to 127.0.0.1:445 4. DeviceIoControl(HANDLE)sock, 0x1203F, NULL, 0, (PVOID)(HookAddress - 3), 0, &Result, NULL) 5. ZwQueryIntervalProfile() AFD.SYS - Demo Demo Walk-through of a complex Kernel PE Exploit • Thanks to my friend Gil Dabah (creator of diStorm Disassembler) • This bug was silently fixed by MS in February Background • When registering a Window Class it’s possible to request the OS to store some extra bytes with the window object • The extra bytes are appended to the WND structure in the kernel: WND Struct Extra Bytes Background - Continued • Some special window types (Menus, Tooltips, etc) also have some private data that can only be accessed by the kernel: WND Struct Private Data Extra Bytes Background - Continued • To change the data on the extra bytes, applications call the SetWindowLongPtr function with the index into the extra bytes and a new value. • The function then checks if the index provided is within the private data or the user extra bytes. If the index is within the private bytes, the function fails, so normally it’s impossible to change the private kernel data. WND Struct Private Data Extra Bytes Background - Continued • In order to check if the index is within the private data, SetWindowLongPtr uses a table of window types with their corresponding total allocated bytes size (WND struct + private). • “Window type” refers to FNID, which is the real identifier of a window type, from a list of hard-coded values (unlike its Class). • The pseudo code for the check is: if (index < (int)(window_class_alloc_sizes[fnid]-sizeof(WND)))) FAIL; Window Type (FNID) allocated bytes Menu 0xa4 (WND size) + 4 (private bytes) == 0xa8 Tooltip 0xa4 (WND size) + 4 (private bytes) == 0xa8 . . . The bug – part 1 • By using the undocumented and unexported function RegisterClassExWOWW and supplying an internal window type and a negative number for the extra bytes, it’s possible to overwrite the table with our own value. The bug is that the extra bytes value isn’t verified: Window Type (FNID) allocated bytes Menu 0xa4 (WND size) + (-0xa8) (extra bytes) == -4 Tooltip 0xa4 (WND size) + 4 (private bytes) == 0xa8 . . . The bug – part 1 - continued • With the table altered to have a negative number as the # of allocated bytes, the test code is tricked: (index < (int)(window_class_alloc_sizes[fnid]-sizeof(WND)))) == always FALSE • we can now call SetWindowLongPtr with 0 as index and change the private kernel data for the window WND Struct Private Data Extra Bytes The bug – part 2 • Now that we can overwrite private kernel data, we need to find a window type that has some useful stuff stored there. • The Menu window type stores a pointer to a structure, and during window destruction, a pointer in that structure is NULLed, giving us the ability to NULL any 32/64 bit value in the system – Bingo! Exploitation • Since the Menu window private structure changes between Windows versions, we run the exploit twice: – The first time overwriting the pointer to the structure with a pointer to some non-NULL array, so that we can find out the offset were the NULL is put – The second time with a pointer to the address we want to NULL minus the offset found in the first stage Private WND Struct Data 1st Time NULL offset Extra Bytes 2nd Time Bogus Data Real Data Bogus Data Real Data Bogus Data Real Data NULL Bogus Data Address to Overwrite NULL Real Data Exploitation - continued • All that’s left now is to allocate our shellcode at page 0, overwrite a function pointer, and then get it called • Easy! Exploitation - flow 1. Find the address of RegisterClassExWOWW using diStorm 2. RegisterClassExWOWW() passing the FNID for a Menu and a WNDCLASSEX structure with a negative number for the extra bytes 3. CreateWindow() 4. SetWindowLongPtr() with a non-NULL array 5. DestroyWindow() 6. Find offset 7. Repeat steps 3-5, this time passing the actual address to overwrite minus the offset 8. Get the overwritten pointer to be called Windows 8 and the future • Null-dereference is blocked – first 64k can’t be allocated • New integrity checks to the kernel pool memory allocator (see http://blogs.msdn.com/b/b8/archive/2011/09/15/p rotecting-you-from-malware.aspx) • Improved Linked-Lists security to protect against corrupted/dangling list pointers (see http://www.alex-ionescu.com/?p=69) Questions? gbakas@gmail.com