Learning Rules for Anomaly Detection of Hostile Network Traffic

advertisement
Learning Rules for Anomaly
Detection of Hostile
Network Traffic
Matthew V. Mahoney and Philip K. Chan
Florida Institute of Technology
Problem: How to detect novel
intrusions in network traffic given
only a model of normal traffic
 Normal
web server request
GET /index.html HTTP/1.0
 Code
Red II worm
GET /default.ida?NNNNNNNNN…
What has been done

Firewalls


Signature Detection (SNORT, BRO)



Can’t block attacks on open ports (web, mail, DNS)
Hand coded rules (search for “default.ida?NNN”)
Can’t detect new attacks
Anomaly Detection (eBayes, ADAM, SPADE)


Learn rules from normal traffic for low-level protocols
(IP, TCP, ICMP)
But application protocols (HTTP, mail) are too hard to
model
Learning Rules for Anomaly
Detection (LERAD)

Associative mining (APRIORI, etc.) learns rules
with high support and confidence for one value
 LERAD learns rules with high support (n) and a
small set of allowed values (r)
 Any value seen at least once in training is
allowed
If port = 80 and word1 = “GET” then
word3  {“HTTP/1.0”, “HTTP/1.1”}
(r = 2)
LERAD Steps
1.
2.
3.
Generate candidate rules
Remove redundant rules
Remove poorly trained rules
LERAD is fast because steps 1-2 can be
done on a small random sample (~100
tuples)
Step 1. Generate Candidate Rules
Suggested by matching attribute values
Sample
Port
Word1
Word2
Word3
S1
80
GET
/index.html
HTTP/1.0
S2
80
GET
/banner.gif
HTTP/1.0
S3
25
HELO
pascal
MAIL

S1 and S2 suggest:




port = 80
if port = 80 then word1 = “GET”
if word3 = “HTTP/1.0” and word1 = “GET then port = 80
S2 and S3 suggest no rules
Step 2. Remove Redundant Rules
Favor rules with higher score = n/r
Sample
Port
Word1
Word2
Word3
S1
80
GET
/index.html
HTTP/1.0
S2
80
GET
/banner.gif
HTTP/1.0
S3
25
HELO
pascal
MAIL
Rule 1: if port = 80 then word1 = “GET” (n/r =
2/1)
Rule 2: if word2 = “/index.html” then word1 =
“GET” (n/r = 1/1)
Rule 2 has lower score and covers no new values,
so it is redundant
Step 3. Remove Poorly Trained Rules
Rules with violations in a validation set will probably
generate false alarms
r (number of
allowed values)
Incompletely trained
rule (removed)
Fully trained
rule (kept)
Train
Validate
Test
Attribute Sets

Inbound client
packets (PKT)

IP packet cut into 24
16-bit fields

Inbound client TCP
streams





Date, time
Source, destination IP
addresses and ports
Length, duration
TCP flags
First 8 application
words
Anomaly score = tn/r summed over violated rules,
t = time since previous violation
Experimental Evaluation

1999 DARPA/Lincoln Laboratory Intrusion
Detection Evaluation (IDEVAL)




Train on week 3 (no attacks)
Test on inside sniffer weeks 4-5 (148 simulated probes,
DOS, and R2L attacks)
Top participants in 1999 detected 40-55% of attacks at
10 false alarms per day
2002 university departmental server traffic (UNIV)



623 hours over 10 weeks
Train and test on adjacent weeks (some unlabeled
attacks in training data)
6 known real attacks (some multiple instances)
Experimental Results
Percent of attacks detected at 10 false alarms per day
70
60
50
40
30
20
10
0
IDEVAL PKT
IDEVAL TCP
UNIV PKT
UNIV TCP
UNIV Detection/False Alarm
Tradeoff
Percent of attacks
detected
Percent of attacks detected at 0 to 40 false alarms per day
100
80
Comb
60
TCP
40
PKT
20
0
0
10
20
30
False alarms per day per detector
40
Run Time Performance
(750 MHz PC – Windows Me)
 Preprocess
9 GB IDEVAL traffic = 7 min.
 Train + test < 2 min. (all systems)
Anomalies are due to bugs and
idiosyncrasies in hostile code
No obvious way to distinguish from benign events
UNIV attack
Inside port scan
Code Red II worm
Nimda worm
Scalper worm
Proxy scan
DNS version probe
How detected
HEAD / HTTP\1.0 (backslash)
TCP segmentation after GET
host: www
host: unknown
host: www.yahoo.com
(not detected)
Contributions
 LERAD
differs from association mining in
that the goal is to find rules for anomaly
detection: a small set of allowed values
 LERAD is fast because rules are
generated from a small sample
 Testing is fast (50-75 rules)
 LERAD improves intrusion detection


Models application protocols
Detects more attacks
Thank you
Download