VICI - Tolerant Systems

advertisement
VM Introspection for Cognitive
Immunity (VICI)
Komoku, Inc.
Tuesday 18 December 2007
Talk: Tim Fraser tfraser@komoku.com
Demo: Matt Evenson mevenson@komoku.com
Agenda
1.
2.
3.
4.
Project status update.
New repair strategies.
New control architecture.
Summary and conclusions.
Copyright (C) 2007 Komoku, Inc.
2
The VICI approach
VICI detects kernel-modifying rootkits
and repairs the infected kernel.
VICI
GOAL
XEN
KERNEL
1. Run diagnostics
1. Self-diagnosis:
< 50% false negative rate
< 10% false positive rate
2. Attempt repair
2. Self-healing:
Repair within 250ms
of infection.
4. Learn
3. Cognitive immunity:
Learn from repeated attacks:
Escalate to optimize response time.
De-escalate to reduce harm.
3. Evaluate repair
Copyright (C) 2007 Komoku, Inc.
3
Project timeline, goals, and progress
(Jun 07)
Q1
Q2
(Dec 07)
Q3
Q4
(Jun 08)
Q5
Q6
Phase 1 prototype:
Basic diagnostics and
repairs
Phase 2 prototype:
Add advanced repairs,
Brooks-style control architecture,
learning for (de)escalation.
Phase 3 (final) prototype:
Increase Surgical layer coverage for Red Team exercises.
Copyright (C) 2007 Komoku, Inc.
4
Progress towards goals
On schedule.
Deliverables as proposed.
Some insight, experience gained.
More expected from Red Team exercise.
GOAL
STATUS
1. Self-diagnosis:
< 50% false negative rate
< 10% false positive rate
o
o
o
o
Five effective strategies.
Additional one discarded.
CP, Reboot effective but slow.
Need to increase coverage of
most basic “Surgical” strategy.
o Need to see how much we can
cram into 250ms.
2. Self-healing:
Repair within 250ms
of infection.
3. Cognitive immunity:
Learn from repeated attacks:
Escalate to optimize response time.
De-escalate to reduce harm.
o Escalation, De-escalation works.
o Ready for testing.
Copyright (C) 2007 Komoku, Inc.
5
What’s new?
New repair strategies:
Core War, Hitman
Checkpoint, Reboot
VICI
XEN
KERNEL
1. Run diagnostics
2. Attempt repair
New control architecture to
map diagnoses to repairs.
Agent learns current threat
sophistication level and
adjusts how it chooses
responses.
4. Learn
3. Evaluate repair
Copyright (C) 2007 Komoku, Inc.
6
Learning the present threat level
VICI Agent
Repair strategy
:-)
Surgical
:-|
Core War
:-(
Hitman
>:-(
Checkpoint
>:-O
Reboot
VM kernel
Copyright (C) 2007 Komoku, Inc.
• Agent gets “angry” when
repairs fail repeatedly.
• Angry Agent switches to
more extreme repair
strategies.
• Extreme repairs may
defeat clever rootkits,
but they may also
destroy useful kernel
state ( == cost).
• Successful repairs make
Agent calm down, back
down from extreme
repairs.
• This escalation and deescalation makes the
Agent learn and adjust
to the current level of
attack sophistication.
7
Part 2: new repair strategies
Copyright (C) 2007 Komoku, Inc.
8
Surgical repair on basic Ttysnoop
User app
System
call vector
surgical
Rootkit
Kernel
text
infected
repaired
Surgical repair is simple and does not cause collateral damage.
Copyright (C) 2007 Komoku, Inc.
9
Core War on Ttysnoop w/snoopd
User app
System
call vector
surgical
surgical
core war
Rootkit
Kernel
text
infected
surgical repair
ineffective
repaired
Core War repair leaves bad control flow but renders rootkit harmless.
Copyright (C) 2007 Komoku, Inc.
10
How Core War works
System Call Table
sys_read
Ttysnoop fake sys_read()
real sys_read()
call real sys_read
If password then print
return to caller
1. Core War drops in code to jump to the real function at the top of he fake routine.
• Same two-instruction code snippet works for everyone:
• Leave stack the same, jump to the real function’s start address.
2. Core War writes NOPs from that point down to the beginning of the stack cleanup
and return code.
• Only threads that already went through the rootkit before the repair return
through these NOPs.
• Threads that arrive after the repair jump to the real function and never return
to the rootkit.
Copyright (C) 2007 Komoku, Inc.
11
Hitman on Ttysnoop w/strongd
User app
System
call vector
Rootkit
surgical
surgical
core war
hitman,
core war
Kernel
text
infected
core war repair
ineffective
repaired
Hitman repair kills the rootkit kernel threads that defeat other repairs.
Copyright (C) 2007 Komoku, Inc.
12
How Hitman works
I.
Identify rootkit
start and end addrs
System
call table
0xc7891011
0xc4560004
0xd00d0bad
0xc1230080
II. For each process
III. Kill processes
Top of per-process
kernel stack
0x56780000
0xd00d1234
0x00001234
0x91011121
This could be a stored
return address. Write
invalid instruction here
to kill process.
Ttysnoop
start: 0xd00d0000
end: 0xd00e0000
If rootkit not in modules
list, use 4KB page that
contains bad address
for start and end.
Plan: Lay mines on path used by rootkit helpers
not on path used by good processes.
ttysnoop:
Copyright (C) 2007 Komoku, Inc.
fake read
helper routine
13
Checkpoint and Reboot repairs
reboot
checkpoint
1
X
Problem: Xen takes
~6 seconds to
Restore a CP.
Need more complex
control to avoid
attacks that
prevent
progress?
2
3
Y
Z
Typical case:
Attack at time Z.
VICI restores CP 3.
Some loss of state.
Possible stealthy case?
Infect at Y using some stealthy method VICI misses.
Remains dormant until Z, VICI now detects.
VICI restores CP 3, 2, 1 to reach uninfected CP.
Worst case:
Infect at X, dormant until Z. Need to reboot. Massive state loss.
Copyright (C) 2007 Komoku, Inc.
14
Part 3: new control scheme
Copyright (C) 2007 Komoku, Inc.
15
Brooks control scheme for robots
Code
Variable
Code
Variable
Code
Level 0: avoid collisions
Sonar
Distance
measurements
Be scared
of nearby
objects
Direction to
flee in
Motor
controller
Key insight: the world is its own best representation.
Brooks development method:
1. Start with an initial level for the simplest behavior.
2. Test robot in real world until you get it right.
3. Add more levels. Life-like behavior emerges from composition of levels.
Copyright (C) 2007 Komoku, Inc.
16
Brooks control scheme for robots
Code
Variable
Code
Variable
Code
Level 0: avoid collisions
Sonar
Distance
measurements
Be scared
of nearby
objects
Direction
to wander
in
Combine
wander with
object
avoidance
Direction to
flee in
Motor
controller
Level 1: explore
Pick a
random
direction
Direction to
travel in
• Higher levels can read, overwrite lower levels’ variables to use, modify their behavior.
• Lower levels cannot know about higher levels.
Copyright (C) 2007 Komoku, Inc.
17
Brooks control scheme for VICI
Code
Variable
Code
Variable
Code
Lists of
tampered
tables,
text, …
Control:
if it’s bad,
it needs
fixing
Lists of
tables, text
to fix
Repair:
write back
good
values
List of bad
function
pointers
Control:
On repeated
lvl 0 failure,
do Core War
Level 0: surgical repair
Diagnostic:
hash, value
comparisons
Level 1: core war
Diagnostic:
Identify individual bad
pointers
Rootkit
functions
to neuter
Repair:
Neuter
rootkit
code
• Higher levels can read, overwrite lower levels’ variables to use, modify their behavior.
• Lower levels cannot know about higher levels.
Copyright (C) 2007 Komoku, Inc.
18
Escalation and De-escalation
• Core War repair runs when Surgical repair fails once.
• “Fails once” = Surgical detects a problem on two consecutive cycles.
• Hitman follows Core War, then Checkpoint, then Reboot.
In demo, Agent sleeps to make this ~3 secs
Escalation =
Immediate
Hitman 10X.
HITMAN:
CORE WAR:
SURGICAL:
HITMAN:
CORE WAR:
SURGICAL:
delay avoided
De-escalation =
After 10 of
These…
Drop down to
10 of these, so
long as it works.
• Escalation optimizes response for time when faced with repeated attack.
• De-escalation backs down from expensive repairs when cheap ones work again.
Copyright (C) 2007 Komoku, Inc.
19
Screenshot from demo
Scrolling display
tracks VICI Agent’s
“anger” level as
Agent runs.
Red bars are cycles
where VICI detected
attacks.
Green bars are cycles
where VICI detected
no attacks.
Bar height indicates
anger level.
Copyright (C) 2007 Komoku, Inc.
20
VICI layers = directed acyclic graph
ktables
ktext
mtext
registers
Surgical
entropy
packet
1
Core War
2
Hitman
3
Checkpoint
4
Reboot
5
Copyright (C) 2007 Komoku, Inc.
21
Part 4: Summary and conclusion
Copyright (C) 2007 Komoku, Inc.
22
Insights, experience so far
1. The 250ms time bound limits what you can do and how you can do it.
• Komoku Monitoring Engine’s scripting language too slow, checks too numerous.
• Solution: VICI Agent entirely C-based, fewer checks.
2. Xen source code availability is critical for research; otherwise not best choice.
• Checkpoint and restore is slow.
• Can’t checkpoint HVM machines without killing VM.
• Perhaps better: small custom hypervisor
- No fancy inter-domain communication interface
- No general-purpose OS in domain 0.
3. Brooks architecture aids incremental development as advertised, but…
• discourages use of strong interfaces and
• abstraction for complexity control if followed literally.
Copyright (C) 2007 Komoku, Inc.
23
Tasks completed and remaining
Prototype Tasks
Phase 1:
(Goal: basic diagnosis & repair.)
Phase 2:
(Goal: alternate
repairs and
learning.)
Surgical ktables ktext entropy …
repairs:
Nonsurgical Core War, Hitman
repairs: Checkpoint, Reboot
Malware for tests
Rootsim
Ttysnoop with
snoopd and
strongd
Control artchitecture
Learning
Phase 3:
(Goal: meet
SRS2 requirements.)
Increase Coverage
Red Team Exercises
Copyright (C) 2007 Komoku, Inc.
24
Summary of accomplishments
• Demonstrated automated detection:
+ Effective against 6 categories of attack derived from real-world
rootkits and current research.
- 250ms limit is apt to limit coverage.
• Demonstrated surgical, core war, hitman, checkpoint, reboot repairs:
+ Provides effective self-healing in our tests.
- Checkpoint, reboot repairs take too long (~6 seconds).
• Demonstrated control scheme for escalation and de-escalation:
+ Needs no complex internal representation of what a rootkit is.
+ Agent learns, reacts to current threat sophistication level.
Copyright (C) 2007 Komoku, Inc.
25
Extra slides
Copyright (C) 2007 Komoku, Inc.
26
What is a kernel-modifying rootkit?
User Apps
Jump Table
Rootkit
Kernel
Text
Frequently
Changing
Kernel Data
Registers
• Adversaries install kernel-modifying rootkits
after they have gained full administrative
control over a machine.
• The rootkit makes the kernel lie, hiding the
adversary’s presence from the real admins.
• Hide processes, files.
• Some rootkits also provide backdoors,
TTY sniffers.
• How do rootkits modify the kernel’s
behavior?
• Replace jump table function pointers
with pointers to rootkit code.
• Modify kernel text (instructions)
• Modify other kernel data structures
(example: process table links)
• Modify CPU registers.
Copyright (C) 2007 Komoku, Inc.
27
Surgical Repair
User Apps
Jump Table
Diagnostic
}
Repair
MD5 Hash
Overwrite
MD5 Hash
Overwrite
Rootkit
Kernel
Text
Frequently
Changing
Kernel Data
Registers
}
Overwrite
Surgical repair essentially writes back proper values. Our coverage is presently poor.
Copyright (C) 2007 Komoku, Inc.
28
Learning in the Brooks architecture
Code
Variable
Code
List of bad
function
pointers
Control:
On repeated
lvl 0 failure,
do Core War
Variable
Code
Level 1: core war
Diagnostic:
Identify individual bad
pointers
Feedback can change these:
The algorithm is fixed:
Rootkit
functions
to neuter
Control state
angry = 3
threshold = 1
delta = 1
on level 0 failure: angry += delta
on level 0 success: angry = 0
on angry >= threshold: do repair
Repair:
Neuter
rootkit
code
Wiring is
fixed, too.
Each level has its own separate feedback function. There is no global feedback function.
Copyright (C) 2007 Komoku, Inc.
29
Assumptions
1. In a real deployment:
A. The Domain 0 OS would be hardened. Ours isn’t.
B. Xen would be hardened. Ours isn’t.
(Actually, a less featureful custom hypervisor without a general-purpose Domain 0
OS would probably be better than Xen + Debian GNU/Linux.)
2. In a real product, VICI would learn what a healthy kernel looks like by examining
installation media or some non-deployed gold-standard healthy kernel. (Useful in
a product but not interesting code for research.)
Instead, we assume a grace period after boot during which we can snapshot the
virtualized kernel in a known-good state.
3. User-mode rootkits aren’t interesting anymore. We care only about kernelmodifying rootkits.
4. An a adversary can easily gain administrative control of the victim OS.
Copyright (C) 2007 Komoku, Inc.
30
What’s a rootkit and what’s not
Rootkits make persistent modifications to the kernel in order to allow the adversary to maintain
a clandestine presence on the system for days, weeks, or months.
A rootkit must have at least some useful functionality: hiding processes, files, modules, or
sniffing TTYs.
It must modify the kernel’s responses to all requests for relevant services made by all
processes, with the possible exception of a small set of processes operated exclusively by the
adversary. Alternately, in the case of TTY sniffers, it must monitor the requests rather than
modify the responses.
It is easy to add and immediately remove a kernel modification in order to avoid detection.
However, that by itself is not sufficient to make a rootkit. A rootkit needs persistent
modifications that operate synchronously with user requests, for example, to tamper with the
results of the sys_read system call whenever any user process calls sys_read. Still, some clever
rootkits make a very small set of persistent changes along strategic control-flow paths that
allow them to set up and remove additional temporary changes.
A rootkit must have some means for remote control over the network (perhaps a backdoor)
and/or a means for exfiltrating data over the network.
Copyright (C) 2007 Komoku, Inc.
31
Download