VM Introspection for Cognitive Immunity (VICI) Komoku, Inc. Tuesday 18 December 2007 Talk: Tim Fraser tfraser@komoku.com Demo: Matt Evenson mevenson@komoku.com Agenda 1. 2. 3. 4. Project status update. New repair strategies. New control architecture. Summary and conclusions. Copyright (C) 2007 Komoku, Inc. 2 The VICI approach VICI detects kernel-modifying rootkits and repairs the infected kernel. VICI GOAL XEN KERNEL 1. Run diagnostics 1. Self-diagnosis: < 50% false negative rate < 10% false positive rate 2. Attempt repair 2. Self-healing: Repair within 250ms of infection. 4. Learn 3. Cognitive immunity: Learn from repeated attacks: Escalate to optimize response time. De-escalate to reduce harm. 3. Evaluate repair Copyright (C) 2007 Komoku, Inc. 3 Project timeline, goals, and progress (Jun 07) Q1 Q2 (Dec 07) Q3 Q4 (Jun 08) Q5 Q6 Phase 1 prototype: Basic diagnostics and repairs Phase 2 prototype: Add advanced repairs, Brooks-style control architecture, learning for (de)escalation. Phase 3 (final) prototype: Increase Surgical layer coverage for Red Team exercises. Copyright (C) 2007 Komoku, Inc. 4 Progress towards goals On schedule. Deliverables as proposed. Some insight, experience gained. More expected from Red Team exercise. GOAL STATUS 1. Self-diagnosis: < 50% false negative rate < 10% false positive rate o o o o Five effective strategies. Additional one discarded. CP, Reboot effective but slow. Need to increase coverage of most basic “Surgical” strategy. o Need to see how much we can cram into 250ms. 2. Self-healing: Repair within 250ms of infection. 3. Cognitive immunity: Learn from repeated attacks: Escalate to optimize response time. De-escalate to reduce harm. o Escalation, De-escalation works. o Ready for testing. Copyright (C) 2007 Komoku, Inc. 5 What’s new? New repair strategies: Core War, Hitman Checkpoint, Reboot VICI XEN KERNEL 1. Run diagnostics 2. Attempt repair New control architecture to map diagnoses to repairs. Agent learns current threat sophistication level and adjusts how it chooses responses. 4. Learn 3. Evaluate repair Copyright (C) 2007 Komoku, Inc. 6 Learning the present threat level VICI Agent Repair strategy :-) Surgical :-| Core War :-( Hitman >:-( Checkpoint >:-O Reboot VM kernel Copyright (C) 2007 Komoku, Inc. • Agent gets “angry” when repairs fail repeatedly. • Angry Agent switches to more extreme repair strategies. • Extreme repairs may defeat clever rootkits, but they may also destroy useful kernel state ( == cost). • Successful repairs make Agent calm down, back down from extreme repairs. • This escalation and deescalation makes the Agent learn and adjust to the current level of attack sophistication. 7 Part 2: new repair strategies Copyright (C) 2007 Komoku, Inc. 8 Surgical repair on basic Ttysnoop User app System call vector surgical Rootkit Kernel text infected repaired Surgical repair is simple and does not cause collateral damage. Copyright (C) 2007 Komoku, Inc. 9 Core War on Ttysnoop w/snoopd User app System call vector surgical surgical core war Rootkit Kernel text infected surgical repair ineffective repaired Core War repair leaves bad control flow but renders rootkit harmless. Copyright (C) 2007 Komoku, Inc. 10 How Core War works System Call Table sys_read Ttysnoop fake sys_read() real sys_read() call real sys_read If password then print return to caller 1. Core War drops in code to jump to the real function at the top of he fake routine. • Same two-instruction code snippet works for everyone: • Leave stack the same, jump to the real function’s start address. 2. Core War writes NOPs from that point down to the beginning of the stack cleanup and return code. • Only threads that already went through the rootkit before the repair return through these NOPs. • Threads that arrive after the repair jump to the real function and never return to the rootkit. Copyright (C) 2007 Komoku, Inc. 11 Hitman on Ttysnoop w/strongd User app System call vector Rootkit surgical surgical core war hitman, core war Kernel text infected core war repair ineffective repaired Hitman repair kills the rootkit kernel threads that defeat other repairs. Copyright (C) 2007 Komoku, Inc. 12 How Hitman works I. Identify rootkit start and end addrs System call table 0xc7891011 0xc4560004 0xd00d0bad 0xc1230080 II. For each process III. Kill processes Top of per-process kernel stack 0x56780000 0xd00d1234 0x00001234 0x91011121 This could be a stored return address. Write invalid instruction here to kill process. Ttysnoop start: 0xd00d0000 end: 0xd00e0000 If rootkit not in modules list, use 4KB page that contains bad address for start and end. Plan: Lay mines on path used by rootkit helpers not on path used by good processes. ttysnoop: Copyright (C) 2007 Komoku, Inc. fake read helper routine 13 Checkpoint and Reboot repairs reboot checkpoint 1 X Problem: Xen takes ~6 seconds to Restore a CP. Need more complex control to avoid attacks that prevent progress? 2 3 Y Z Typical case: Attack at time Z. VICI restores CP 3. Some loss of state. Possible stealthy case? Infect at Y using some stealthy method VICI misses. Remains dormant until Z, VICI now detects. VICI restores CP 3, 2, 1 to reach uninfected CP. Worst case: Infect at X, dormant until Z. Need to reboot. Massive state loss. Copyright (C) 2007 Komoku, Inc. 14 Part 3: new control scheme Copyright (C) 2007 Komoku, Inc. 15 Brooks control scheme for robots Code Variable Code Variable Code Level 0: avoid collisions Sonar Distance measurements Be scared of nearby objects Direction to flee in Motor controller Key insight: the world is its own best representation. Brooks development method: 1. Start with an initial level for the simplest behavior. 2. Test robot in real world until you get it right. 3. Add more levels. Life-like behavior emerges from composition of levels. Copyright (C) 2007 Komoku, Inc. 16 Brooks control scheme for robots Code Variable Code Variable Code Level 0: avoid collisions Sonar Distance measurements Be scared of nearby objects Direction to wander in Combine wander with object avoidance Direction to flee in Motor controller Level 1: explore Pick a random direction Direction to travel in • Higher levels can read, overwrite lower levels’ variables to use, modify their behavior. • Lower levels cannot know about higher levels. Copyright (C) 2007 Komoku, Inc. 17 Brooks control scheme for VICI Code Variable Code Variable Code Lists of tampered tables, text, … Control: if it’s bad, it needs fixing Lists of tables, text to fix Repair: write back good values List of bad function pointers Control: On repeated lvl 0 failure, do Core War Level 0: surgical repair Diagnostic: hash, value comparisons Level 1: core war Diagnostic: Identify individual bad pointers Rootkit functions to neuter Repair: Neuter rootkit code • Higher levels can read, overwrite lower levels’ variables to use, modify their behavior. • Lower levels cannot know about higher levels. Copyright (C) 2007 Komoku, Inc. 18 Escalation and De-escalation • Core War repair runs when Surgical repair fails once. • “Fails once” = Surgical detects a problem on two consecutive cycles. • Hitman follows Core War, then Checkpoint, then Reboot. In demo, Agent sleeps to make this ~3 secs Escalation = Immediate Hitman 10X. HITMAN: CORE WAR: SURGICAL: HITMAN: CORE WAR: SURGICAL: delay avoided De-escalation = After 10 of These… Drop down to 10 of these, so long as it works. • Escalation optimizes response for time when faced with repeated attack. • De-escalation backs down from expensive repairs when cheap ones work again. Copyright (C) 2007 Komoku, Inc. 19 Screenshot from demo Scrolling display tracks VICI Agent’s “anger” level as Agent runs. Red bars are cycles where VICI detected attacks. Green bars are cycles where VICI detected no attacks. Bar height indicates anger level. Copyright (C) 2007 Komoku, Inc. 20 VICI layers = directed acyclic graph ktables ktext mtext registers Surgical entropy packet 1 Core War 2 Hitman 3 Checkpoint 4 Reboot 5 Copyright (C) 2007 Komoku, Inc. 21 Part 4: Summary and conclusion Copyright (C) 2007 Komoku, Inc. 22 Insights, experience so far 1. The 250ms time bound limits what you can do and how you can do it. • Komoku Monitoring Engine’s scripting language too slow, checks too numerous. • Solution: VICI Agent entirely C-based, fewer checks. 2. Xen source code availability is critical for research; otherwise not best choice. • Checkpoint and restore is slow. • Can’t checkpoint HVM machines without killing VM. • Perhaps better: small custom hypervisor - No fancy inter-domain communication interface - No general-purpose OS in domain 0. 3. Brooks architecture aids incremental development as advertised, but… • discourages use of strong interfaces and • abstraction for complexity control if followed literally. Copyright (C) 2007 Komoku, Inc. 23 Tasks completed and remaining Prototype Tasks Phase 1: (Goal: basic diagnosis & repair.) Phase 2: (Goal: alternate repairs and learning.) Surgical ktables ktext entropy … repairs: Nonsurgical Core War, Hitman repairs: Checkpoint, Reboot Malware for tests Rootsim Ttysnoop with snoopd and strongd Control artchitecture Learning Phase 3: (Goal: meet SRS2 requirements.) Increase Coverage Red Team Exercises Copyright (C) 2007 Komoku, Inc. 24 Summary of accomplishments • Demonstrated automated detection: + Effective against 6 categories of attack derived from real-world rootkits and current research. - 250ms limit is apt to limit coverage. • Demonstrated surgical, core war, hitman, checkpoint, reboot repairs: + Provides effective self-healing in our tests. - Checkpoint, reboot repairs take too long (~6 seconds). • Demonstrated control scheme for escalation and de-escalation: + Needs no complex internal representation of what a rootkit is. + Agent learns, reacts to current threat sophistication level. Copyright (C) 2007 Komoku, Inc. 25 Extra slides Copyright (C) 2007 Komoku, Inc. 26 What is a kernel-modifying rootkit? User Apps Jump Table Rootkit Kernel Text Frequently Changing Kernel Data Registers • Adversaries install kernel-modifying rootkits after they have gained full administrative control over a machine. • The rootkit makes the kernel lie, hiding the adversary’s presence from the real admins. • Hide processes, files. • Some rootkits also provide backdoors, TTY sniffers. • How do rootkits modify the kernel’s behavior? • Replace jump table function pointers with pointers to rootkit code. • Modify kernel text (instructions) • Modify other kernel data structures (example: process table links) • Modify CPU registers. Copyright (C) 2007 Komoku, Inc. 27 Surgical Repair User Apps Jump Table Diagnostic } Repair MD5 Hash Overwrite MD5 Hash Overwrite Rootkit Kernel Text Frequently Changing Kernel Data Registers } Overwrite Surgical repair essentially writes back proper values. Our coverage is presently poor. Copyright (C) 2007 Komoku, Inc. 28 Learning in the Brooks architecture Code Variable Code List of bad function pointers Control: On repeated lvl 0 failure, do Core War Variable Code Level 1: core war Diagnostic: Identify individual bad pointers Feedback can change these: The algorithm is fixed: Rootkit functions to neuter Control state angry = 3 threshold = 1 delta = 1 on level 0 failure: angry += delta on level 0 success: angry = 0 on angry >= threshold: do repair Repair: Neuter rootkit code Wiring is fixed, too. Each level has its own separate feedback function. There is no global feedback function. Copyright (C) 2007 Komoku, Inc. 29 Assumptions 1. In a real deployment: A. The Domain 0 OS would be hardened. Ours isn’t. B. Xen would be hardened. Ours isn’t. (Actually, a less featureful custom hypervisor without a general-purpose Domain 0 OS would probably be better than Xen + Debian GNU/Linux.) 2. In a real product, VICI would learn what a healthy kernel looks like by examining installation media or some non-deployed gold-standard healthy kernel. (Useful in a product but not interesting code for research.) Instead, we assume a grace period after boot during which we can snapshot the virtualized kernel in a known-good state. 3. User-mode rootkits aren’t interesting anymore. We care only about kernelmodifying rootkits. 4. An a adversary can easily gain administrative control of the victim OS. Copyright (C) 2007 Komoku, Inc. 30 What’s a rootkit and what’s not Rootkits make persistent modifications to the kernel in order to allow the adversary to maintain a clandestine presence on the system for days, weeks, or months. A rootkit must have at least some useful functionality: hiding processes, files, modules, or sniffing TTYs. It must modify the kernel’s responses to all requests for relevant services made by all processes, with the possible exception of a small set of processes operated exclusively by the adversary. Alternately, in the case of TTY sniffers, it must monitor the requests rather than modify the responses. It is easy to add and immediately remove a kernel modification in order to avoid detection. However, that by itself is not sufficient to make a rootkit. A rootkit needs persistent modifications that operate synchronously with user requests, for example, to tamper with the results of the sys_read system call whenever any user process calls sys_read. Still, some clever rootkits make a very small set of persistent changes along strategic control-flow paths that allow them to set up and remove additional temporary changes. A rootkit must have some means for remote control over the network (perhaps a backdoor) and/or a means for exfiltrating data over the network. Copyright (C) 2007 Komoku, Inc. 31