Learning Rules from System Call Arguments and Sequences for Anomaly Detection

advertisement
Learning Rules from System Call
Arguments and Sequences for
Anomaly Detection
Gaurav Tandon and Philip Chan
Department of Computer Sciences
Florida Institute of Technology
Overview
• Related work in system call sequence-based
systems
• Problem Statement
– Can system call arguments as attributes improve
anomaly detection algorithms?
• Approach
– LERAD ( a conditional rule learning algorithm)
– Variants of attributes
• Experimental evaluation
• Conclusions and future work
Related Work
• tide (time-delay embedding) Forrest et al, 1996
• stide (sequence time-delay embedding) Hofmeyr
et al, 1999
• t-stide (stide with frequency threshold)
Warrender et al, 1999
• Variable length sequence-based techniques
(Wespi et al, 1999, 2000; Jiang et al, 2001)
False Alarms !!
Problem Statement
Current models – system call sequences
What else can we model?
System call arguments
open(“/etc/passwd”)
open(“/users/readme”)
Approach
• Models based upon system calls
• 3 sets of attributes
- system call sequence
- system call arguments
- system call arguments + sequence
• Adopt a rule learning approach - Learning
Rules for Anomaly Detection (LERAD)
Learning Rules for Anomaly Detection
(LERAD) [Mahoney and Chan, 2003]
A  a, B  b,...  X {x1 , x2 ,....}
A, B, and X are attributes
a, b, x1, x2 are values to the corresponding attributes
SC  close(), Arg1  123  Arg 2 {abc, xyz}
p  Pr( X {x1 , x 2 ...} | A  a, B  b,...)  r / n
p - probability of observing a value not in the consequent
r - cardinality of the set {x1, x2, …} in the consequent
n - number of samples that satisfy the antecedent
AnomalyScore  1 / p  n / r
Overview of LERAD
4 steps involved in rule generation:
1
2
3
4
From a small training sample, generate candidate
rules and associate probabilities with them
Coverage test to minimize the rule set
Update rules beyond the small training sample
Validating rules on a separate validation set
Step 1a: Generate Candidate Rules
Training
Data
A
B
C
D
Random
Sample
S1
1
2
3
4
Random
Sample
S2
1
2
3
5
Random
Sample
S3
6
7
8
4
Training
S4
1
0
9
5
Training
S5
1
2
3
4
Validation
S6
6
3
8
5
• Two samples are picked at random (say S1 and S2)
• Matching attributes A, B and C are picked in random order (say B, C and A)
• These attributes are used to form rules with 0, 1, 2 conditions in the
antecedent
Rule 1 : *  B  {2}
Rule 2 : C = 3  B  {2}
Rule 3 : A = 1, C = 3  B {2}
Step 1b: Generate Candidate Rules
Training
Data
A
B
C
D
Random
Sample
S1
1
2
3
4
Random
Sample
S2
1
2
3
5
Random
Sample
S3
6
7
8
4
Training
S4
1
0
9
5
Training
S5
1
2
3
4
Validation
S6
6
3
8
5
• Adding values to the consequent based on a subset of the training set (say
S1-S3)
• Probability estimate p associated with every rule when it is violated ( 
instead of  in each rule)
• Rules are sorted in increasing order of the p
Rule 2 : C = 3  B  {2}
Rule 3 : A = 1, C = 3  B {2}
[ p  1 / 2]
[ p  1 / 2]
Rule 1 : *  B  {2,7}
[ p  2/3]
Step 2: Coverage Test
Training
Data
A
B
C
D
Random
Sample
S1
1
2 (Rule 2)
3
4
Random
Sample
S2
1
2 (Rule 2)
3
5
Random
Sample
S3
6
7 (Rule 1)
8
4
Training
S4
1
0
9
5
Training
S5
1
2
3
4
Validation
S6
6
3
8
5
• Obtain minimal set of rules
Rule 2 : C = 3  B  {2}
Rule 3 : A = 1, C = 3  B {2}
[ p  1 / 2]
[ p  1 / 2]
Rule 1 : *  B  {2,7}
[ p  2/3]
Step 2: Coverage Test
Training
Data
A
B
C
D
Random
Sample
S1
1
2 (Rule 2)
3
4
Random
Sample
S2
1
2 (Rule 2)
3
5
Random
Sample
S3
6
7 (Rule 1)
8
4
Training
S4
1
0
9
5
Training
S5
1
2
3
4
Validation
S6
6
3
8
5
• Obtain minimal set of rules
Rule 2 : C = 3  B  {2}
Rule 3 : A = 1, C = 3  B {2}
[ p  1 / 2]
[p  1/2]
Rule 1 : *  B  {2,7}
[ p  2/3]
Step 3: Updating rules beyond the
training samples
Training
Data
A
B
C
D
Random
Sample
S1
1
2
3
4
Random
Sample
S2
1
2
3
5
Random
Sample
S3
6
7
8
4
Training
S4
1
0
9
5
Training
S5
1
2
3
4
Validation
S6
6
3
8
5
• Extend rules to the entire training (minus validation)
set (samples S1-S5)
Rule 2 : C = 3  B {2}
Rule 1 : *  B {2,7,0}
[ p  1 / 3]
[ p  3/5]
Step 4: Validating rules
Training
Data
A
B
C
D
Random
Sample
S1
1
2
3
4
Random
Sample
S2
1
2
3
5
Random
Sample
S3
6
7
8
4
Training
S4
1
0
9
5
Training
S5
1
2
3
4
Validation
S6
6
3
8
5
• Test the set of rules on the validation set (S6)
• Remove rules that produce anomaly
Rule 2 : C = 3  B {2}
Rule 1 : *  B {2,7,0}
[ p  1 / 3]
[ p  3/5]
Step 4: Validating rules
Training
Data
A
B
C
D
Random
Sample
S1
1
2
3
4
Random
Sample
S2
1
2
3
5
Random
Sample
S3
6
7
8
4
Training
S4
1
0
9
5
Training
S5
1
2
3
4
Validation
S6
6
3
8
5
• Test the set of rules on the validation set (S6)
• Remove rules that produce anomaly
Rule 2 : C = 3  B {2}
Rule 1 : *  B {2,7,0}
[ p  1 / 3]
[ p  3/5]
Learning Rules for Anomaly Detection
(LERAD)
Non-stationary model
- only the last occurrence of an event is important
TotalAnomalyScore   t i / pi   t i ni / ri
i
i
t - time interval since the last anomalous event
i - index of the rule violated
Variants of attributes
•
3 variants
(i) S-LERAD: system call sequence
(ii) A-LERAD: system call arguments
(iii) M-LERAD: system call arguments + sequence
S-LERAD
• System call sequence-based LERAD
• Samples comprising 6 contiguous system
call tokens input to LERAD
SC1
SC2
SC3
SC4
SC5
SC6
mmap()
munmap()
mmap()
munmap()
open()
close()
munmap()
mmap()
munmap()
open()
close()
open()
mmap()
munmap()
open()
close()
open()
mmap()
SC1  mmap (), SC2  munmap()  SC6 {close(), mmap ()}
A-LERAD
• Samples containing system call along with
arguments
• System call will always be a condition in the
antecedent of the rule
SC
Arg1
Arg2
Arg3
Arg4
Arg5
SC  munmap()  Arg1  {0 x134,0102,0 x 211,0 x124}
M-LERAD
• Combination of system call sequences and
arguments
SC1  close (), Arg1  0 x134  SC 3 {munmap()}
1999 DARPA IDS Evaluation
[Lippmann et al, 2000]
• Week 3 – Training data (~ 2.1 million
system calls)
• Weeks 4 and 5 – Test Data (over 7 million
system calls)
• Total – 51 attacks on the Solaris host
Experimental Procedures
• Preprocessing the data:
BSM audit log
Pi
Application 1
Applications
Pj
Application 2
• Model per application
• Merge all alarms
Processes
…
Pk
Application N
Evaluation Criteria
• Attack detected if alarm generated within
60 seconds of occurrence of the attack
• Number of attacks detected @ 10 false
alarms/day
• Time and storage requirements
Detections vs. false alarms
35
Attacks Detected
30
25
20
15
10
5
0
1
5
10
50
10 0
Fals e Alarm s pe r Day
M-LERAD
A-LERAD
t-stide
stide
S-LERAD
tide
Percentage detections per attack type
Percentage of Attacks Detected
100
90
80
70
60
50
40
30
20
10
0
Probes (5) DOS (19)
R2L (12)
U2R (9)
Data (4)
Attack Types (Number of Attacks)
tide
stide
t-stide
S-LERAD
A-LERAD
Data-U2R
(2)
M-LERAD
Comparison of CPU times
Application
Training Time (seconds)
[on 1 week of data]
t-stide
M-LERAD
Testing Time (seconds)
[on 2 weeks of data]
t-stide
M-LERAD
ftpd
0.2
1.0
0.2
1.0
Telnetd
1.0
7.9
1.0
9.8
ufsdump
6.8
33.3
0.4
1.8
tcsh
6.3
32.8
5.9
37.6
login
2.4
16.7
2.4
19.9
sendmail
2.7
15.1
3.2
21.6
quota
0.2
3.5
0.2
3.8
sh
0.2
3.2
0.4
5.6
Storage Requirements
• More data extracted (system calls + arguments) – more
space
• Only during training – can be done offline
• Small rule set vs. large database (stide, t-stide)
• e.g. for tcsh application:
1.5 KB file for the set of rules (M-LERAD)
5 KB for sequence database (stide)
Summary of contributions
• Introduced argument information to model systems
• Enhanced LERAD to form rules with system calls as
pivotal attributes
• LERAD with argument information detects more attacks
than existing system call sequence based algorithms
(tide, stide, t-stide).
• Sequence + argument based system generally detected
the most attacks with different false alarm rates
• Argument information alone can be used effectively to
detect attacks at lower false alarm rates
• Less memory requirements during detection as
compared to sequence based techniques
Future Work
• More $$$$$$$$$$
Future Work
• A richer representation
More attributes - time between subsequent
system calls
• Anomaly score
t-stide vs. LERAD
Thank You
Related documents
Download