Reverse Hashing for High-speed Network Monitoring: Algorithms,

advertisement
A Dos Resilient Flow-level Intrusion
Detection Approach for High-speed
Networks
Yan Gao, Zhichun Li, Yan Chen
Department of EECS, Northwestern University
Presented By
Sudarsan Vinay Maddi
Christopher Brandon Barkley
Outline
Motivation
 Background on Sketches
 Design of the HiFIND system
 Evaluation
 Conclusion

The Problem

The increasing frequency, severity,
and sophistication of viruses makes
it critical to detect outbursts at
routers and gateways instead of end
hosts.
Current Intrusion Detection
Systems
Signature-based Detection
 Anomaly-based Detection

Signature-based Intrustion
Detection




Examples: BRO, Snort
Perform pattern-matching and report
situations that match known attack
types.
Advantage: Accurately detects known
attack types.
Disadvantage: Attackers can modify or
create attacks that avoid detection until a
software update.
Anomaly-based Intrusion Detection




Example: Manhunt
Build a model of acceptable behavior and
flag exceptions using heuristics.
Advantage: Model is built according to
actual use and can detect previously
unknown attacks.
Disadvantage: Heuristic model can lead
to false positives, system is inaccurate in
the beginning (when it has little
information).
Existing Network IDSes Insufficient


Signature based IDS cannot recognize unknown or
polymorphic intrusions
Statistical IDSes for rescue, but

Flow-level detection: unscalable
Vulnerable to DoS attacks
e.g. TRW [IEEE SSP 04], TRW-AC [ USENIX
Security
Symposium 04], Superspreader [NDSS 05]
for port scan detection


Overall traffic based detection: inaccurate, high false
positives
e.g. Change Point Monitoring for flooding attack
detection [IEEE Trans. on DSC 04]
Existing Network IDSes Insufficient

Key features missing


Distinguish SYN flooding and various port scans
for effective mitigation
Aggregated detection over multiple vantage
points
Other Limitations



Another limitation of existing IDSes is
that they are implemented in
software.
Software-based data recording have
trouble keeping up with link speeds of
high-speed routers.
To solve this data recording must be
hardware implementable.
HiFIND System
The main goal is to develop an accurate Highspeed Flow-level Intrusion Detection (HiFIND)
system
 Leverage the data streaming techniques:
reversible sketches
 Select an optimal small set of metrics from
TCP/IP headers for monitoring and detection
 Aggregate compact sketches from multiple
routers for distributed detection
Goals of HiFIND
Scalable to flow-level detection on
high speed networks
 DoS resilient
 Distinguish SYN flooding from port
scans
 Enable aggregate detection over
multiple gateways.
 Seperate anomalies to limit false
positives.

Deployment of HiFIND

Attached to a router/switch as a black box
Edge network detection particularly
powerful
HiFIND
Inter
net
LA
N
LA
N
Switch
scan
port
system
Switch
Inter
net
Splitter
HiFIND
system

HiFIND
system
LA
N
Switch
Router
scan
port
Switch
LAN
(a)
HiFIND
system
(b)
Original configuration
Splitter
Router
Switch
LA
N
Inter
net
Monitor each port
separately
Router
Switch
LA
N
(c)
Monitor aggregated
traffic from all ports
Outline
Motivation
 Background on Sketches
 Design of the HiFIND system
 Evaluation
 Conclusion

Reversible Sketches



Traditional sketches do not store key
information making it hard to infer a culprit
flow.
Reversible sketches use a reversible hashing
function to infer keys of culprits without
storing explicit key information.
More info: Reversible Sketches for Efficient
and Accurate Change Detection over
Network Data Streams by Schweller, Gupta,
Parsons, and Chen of Northwestern
University.
Two Dimensional k-ary Sketch
Instead of using one-dimensional
hash table, use a 2D hash table
matrix.
 Allows to distinguish between types
of attacks by keeping track of more
information.
 Ex. Columns are a hash of
{SIP,DIP}, rows are a hash of Dport.

Outline
Motivation
 Background on Sketches
 Design of the HiFIND system





Architecture
Sketch-based intrusion detection
Intrusion classification with 2D sketches
Feature analysis
Evaluation
 Conclusion

Architecture of the HiFIND system
R
eversib
le
sketch
es&2D
sketch
esfro
m
o
th
erro
u
ters
R
eco
rd
in
g
stag
e
R
eal
traffic
stream
R
eversib
le
sketch&
2Dsketch
S
ketch
reco
rd
in
g
D
etectio
n
stag
e
A
g
g
reg
ated
2Dsketch
R
eco
rd
in
g
stag
e
A
g
g
reg
ated
reversib
le
sketch
T
im
e
S
eries
A
n
alysis
m
eth
o
d
s
F
o
recast
erro
r
sketch
T
h
resh
o
ld
b
ased
d
etectio
n
In
tru
sio
n
classificatio
n
F
alse
p
o
sitive
red
u
ctio
n
F
o
recast
sketch
P
h
ase1
P
h
ase2
P
h
ase3
A
ttack
m
itig
atio
n
D
etectio
n
stag
e
Architecture of the HiFIND system

Threat model


TCP SYN flooding (DoS attack)
Port scan
Horizontal scan
 Vertical scan
 Block scan
Forecast methods



EWMA
Sketch-based Detection Algorithm
Keys
SYN flooding
Hscan
Vscan
Score
{SIP, Dport}
non-spoofed
Yes
No
1.5
{DIP, Dport}
Yes
No
No
1
{SIP, DIP}
non-spoofed
No
Yes
1.5
{SIP}
non-spoofed
Yes
Yes
2.5
{DIP}
Yes
No
Yes
2
{Dport}
Yes
Yes
No
2
Sketch-based Detection Algorithm

RS({DIP, Dport}, #SYN - #SYN/ACK)


RS({SIP, DIP}, #SYN - #SYN/ACK)


Detect SYN flooding attacks
Detect any intruder trying to attack a
particular IP address
RS({SIP, Dport}, #SYN - #SYN/ACK)

Detect any source IP which causes a large
number of uncompleted connections to a
particular destination port
Intrusion Classification

Major challenge


Can not completely differentiate different types of
attacks
E.g., if destination port distribution unknown, it is
hard to distinguish non-Spoofing SYN flooding
attacks from vertical scans by
RS({SIP, DIP}, #SYN - #SYN/ACK)
Intrusion Classification

Bi-modal distribution
SYN floodings
Vertical scans
SYN floodings
Vertical scans
Two-dimensional (2D) Sketch
For example: differentiate vertical scan from SYN flooding attack
 The two-dimensional k-ary sketches
Ky rows
hy(Dport)
hy(Dport)
hy(Dport)
hx(SIP,DIP)
Kx columns
hx(SIP,DIP)
H two-dimensional
hash matrices

An example of UPDATE operation
hy(80)
+1
Packet {2.3.0.5, 9.7.2.3, 80,SYN}
hx(2.3.0.5,9.7.2.3)
hx(SIP,DIP)
DoS Resilience Analysis

HiFIND system is resilient to various DoS attacks as
follows
Send source spoofed SYN packets to a fixed destination


Send source spoofed packet to random destinations


Detected as SYN flooding attack
Evenly distributed in the buckets of each hash table, no
false positives
Reverse-engineer the hash functions to create collisions

Difficult to reverse engineering of hash functions
Unknown hash output of each hash function
 Multiple hash tables and different hash functions
Even know the hash functions of sketches



Very hard to find collisions through exhaustive search
Distributed Intrusion Detection
• Naive solution:
Transport all the packet traces or connection states to
the central site
• HiFIND:
Summarize the traffic with compact sketches at each
edge router, and deliver them to the central site
Outline
Motivation
 Background on Sketches
 Design of the HiFIND system
 Evaluation
 Conclusion

Evaluation Methodology

Router traffic traces

Lawrence Berkeley National Laboratory


Northwestern University


One-day trace with ~900M netflow records
One day experiment in May 2005 with 239M
netflow records, 1.8TB traffic and 1:1 packet
samples
Evaluation metrics


Detection accuracy
Online performance:
Speed
 Memory consumption
 Memory access per packet

Highly Accurate
A
g
g
r
e
g
a
t
e
d
2
D
s
k
e
t
c
h
R
e
c
o
r
d
i
n
g
s
t
a
g
e
A
g
g
r
e
g
a
t
e
d
r
e
v
e
r
s
i
b
l
e
s
k
e
t
c
h
T
i
m
e
S
e
r
i
e
s
A
n
a
l
y
s
i
s
m
e
t
h
o
d
s
F
o
r
e
c
a
s
t
e
r
r
o
r
s
k
e
t
c
h
T
h
r
e
s
h
o
l
d
b
a
s
e
d
d
e
t
e
c
t
i
o
n
I
n
t
r
u
s
i
o
n
c
l
a
s
s
i
f
i
c
a
t
i
o
n
F
a
l
s
e
p
o
s
i
t
i
v
e
r
e
d
u
c
t
i
o
n
F
o
r
e
c
a
s
t
s
k
e
t
c
h
P
h
a
s
e
1
P
h
a
s
e
2
P
h
a
s
e
3
A
t
t
a
c
k
m
i
t
i
g
a
t
i
o
n
Detection Validation

SYN flooding
 Backscatter Hscans and Vscans
 The knowledge of port number
e.g. 5 major scenarios of the top 10 Hscans
Anonymized
SIP
Dport
# DIP
Cause
204.10.110.38
1433
56275
SQLSnake scan
5.4.247.103
1433
54788
SQLSnake scan
109.132.101.1
99
22
45014
Scan SSH
95.30.62.202
3306
25964
MySQL Bot scans
15.192.50.153
4899
23687
Rahack worm
Detection Validation
e.g. 5 major scenarios of the bottom 10
Hscans
Anonymized
SIP
Dport
# DIP
Cause
98.198.251.16
8
135
64
Nachi or MSBlast worm
3.66.52.227
445
64
Sasser and Korgo worm
2.0.28.90
139
64
NetBIOS scan
98.198.0.101
135
64
Nachi or MSBlast worm
165.5.42.10
5554
62
Sasser worm
Online performance evaluation


Small memory access per packet
 16 memory accesses per packet with parallel recording
Small memory consumption
Online performance evaluation


Recording speed
 Worst case: recording 239M items in 20.6 seconds
i.e., 11M insertions/sec
Detection speed
 Detection on 1430 minute intervals



Average detection time: 0.34 seconds
Maximum detection time: 12.91 seconds
Stress experiments in each hour interval

Detecting top 100 anomalies with average
35.61 seconds and maximum 46.90 seconds
Outline
Motivation
 Background on Sketches
 Design of the HiFIND system
 Evaluation
 Conclusion

Conclusion - Advantages
Achieves proposed goals including
scalability and distinguishing attack
types.
 Highly accurate on test data.
 Reduction in False Positives
 Very low memory usage (13.2 MB)

Conclusion - Disadvantages



HiFIND did not detect some small
horizontal port scans that TRW
detected.
Authors said these were a combination
of multiple small scans too stealthy for
their thresholds
Future work to further investigate this
and find a way to account for it.
Conclusion – Paper Disadvantages
Authors vague on implementation,
only mentioning it used a single
FPGA board.
 Authors not explicitly define terms
(e.g. Sketches).
 Authors do not explain or cite
heuristics used to reduce false
positives.

Thank You !
Questions?
Download