Intrusion Detection using Sequences of System Calls

advertisement
Intrusion Detection using
Sequences of System
Calls
By S. Hofmeyr & S. Forrest
Overview
Focus: privileged processes
Discriminator: system call sequences
Building a database: defining “normal”
Detecting anomalies: how to measure
Results: promising numbers
Concerns: remaining doubts
Extensions of research: Jones, Li & Lin
Inspiration
Human immune system
Recognition of self
Rejection of nonself
How would we describe “self” for a
software system, or a program?
Focus and Motivation
Focus on privileged processes
Exploitation can give a user root access
They provide a natural boundary
e.g. telnet daemon, login daemon
Privileged processes are easier to track
Specific, limited function
Stable over time
Contrast with the diversity of user actions
Where do we look?
Need to distinguish when:
Privileged process runs normally
Privileged process exhibits an anomaly
The discriminator is the observable entity
used to distinguish between these two
Use sequences of system calls as the
discriminator, the signature
How much detail?
Discriminator is sequences of system calls
Simple temporal ordering is chosen
Ignore parameters
Ignore specific timing information
Ignore everything else!
Why? As much as possible, work with
simple assumptions
Is it “enough”?
Is it enough detail?
Does the discriminator include enough
detail for this hypothesis to hold?
Answer seems to be yes !
Extra complication: due to the variability
in configuration and use of individual
systems, the set of “normal” sequences of
system calls will be different on different
systems
Design Decisions
Remember temporal ordering of calls
Not total sequence, but sequences of length k
What size should k be?
Long enough to detect anomalies, short as
possible
Empirical observation: length 6 to 10 is sufficient
So “self” is a database of (unordered) short
call sequences
Building the “normal” database
Synthetic
Assurance that the normal database contains
no intrusions; reproducible
But does not reflect any particular real user
activity
Actual use
Necessary to generate from actual use in
order to have a unique “self”
How long to accumulate? Is it clean?
The normal database
Database of normal sequences does not
contain all legal sequences
If it did, anomalies would not be detected
Some rare sequences will not be used during
database initialization
Database is stored as a forest to save
space
Signature Database
Structure (length 3)
fopen
fread
fopen
fread
strcmp
fread
strcmp
strcmp
strcmp
strcmp
fopen
strcmp
fopen
fread
strcmp
strcmp
fopen
fopen
fread
fread
fread
strcmp
strcmp
strcmp strcmp
strcmp strcmp
fopen
fopen
fread
Derive Robust Signature
Database
Robust Signature Database
600
Database Size
500
400
300
200
100
0
0
2000
4000
6000
Total Seqences Scanned
8000
10000
Detecting anomalies
A call sequence not in the database is an
anomalous sequence
Strength of that anomalous sequence is
measured by “Hamming distance” to the
closest normal sequence (called dmin)
Any call trace with an anomalous
sequence is an anomalous trace
Detecting anomalies
Strength of an anomalous trace is the
maximum dmin of the trace normalized for
the value of k (length of sequences in the
database):
ŜA = max{dmin values for the trace} / k
Value is between 0 and 1
By adjusting the threshold value for ŜA,
false positives can be reduced
Efficiency
Complexity of computing dmin
O(k(RAN + 1))
k is sequence length, RA is ratio of anomalous to
normal sequences, N is the number of sequences
in the database
dmin is calculated after every system call
The constant associated with this algorithm
is very important
Not yet running in real time
Results (synthetic)
Sanity test: If different programs are not
distinguishable, anomalies within one
program will certainly not be either
Easy to distinguish between programs;
mismatches on well more than 50% of the
instruction sequences (and ŜA >= 0.6)
All intrusions (both attempted & successful)
produced anomalies of varying strengths
Results (real environment)
The conjecture of unique normal
databases
Experiments in two configurations (at UNM
and MIT) had very different databases for the
same program (lpr)
Is this typical?
Closing concerns
False positives vs false negatives
If forced to choose, UNM prefers to have false
negatives because layering can mitigate
Saw 1 per 100 print jobs (lpr)
Due to system problems
Is ŜA a good measure?
It could help generate false positives
Single extra system call might make ŜA = 0.5
Annex Material
Some UVa experiments
S. Li, Y. Lin, and A. Jones
Illustrated by
two attacks
on Apache
Varied
sequence
length from 2
to 30
We chose
length 10 to
have margin
of error
Normalized Anomaly
Signal
Signature Length Has Little
Effect
1.2
1
0.8
0.6
0.4
0.2
0
0
10
20
30
Sequence Length
40
Effectiveness: Buffer Overflow
High normalized
#Mismatch %Mismatc
Normalized Anomaly
anomaly signals
es
hes indicate attacks
Signal
Stack Overwrite
467
3.5
0.7
Realpath
Vulnerability
569
2.7
0.6
Successfully detected buffer overflow
attacks against wu-ftpd
Work well because attacker code adds
new sequences of library calls
Effectiveness: Denial of Service
 Simulated DOS attack that uses up all available memory
 As attack progresses, library calls requesting memory
return abnormally and are re-issued
 DOS attack caused application to invoke new library call,
fsync
Program - vi
#Mismat
ches
Normal Run
0
DOS Attack
101
No intrusion
detected
%Mismat Normalized
Anomaly
ches
Signal
0
2.6normalized
High
anomaly signal
indicates attack
0
0.6
Download