Network-based and Attack-resilient Length Signature Generation for

advertisement
Network-based and Attack-resilient
Length Signature Generation for
Zero-day Polymorphic Worms
Zhichun Li1, Lanjia Wang2, Yan Chen1
and Judy Fu3
1
Lab for Internet and Security Technology (LIST), Northwestern Univ.
2
Tsinghua University, China
3
Motorola Labs, USA
LESG (LEngth-based Signature Generation)
Based on the observation that buffer overflow is
one of the most common vulnerability types
exploited remotely and certain protocol fields
might map to the vulnerable buffer.
Authors propose a three-step algorithm to
generate the protocol field length signatures
with analytical attack resilience bound.
Outline
•
•
•
•
•
•
•
Motivation and Related Work
Design of LESG
Problem Statement
Three Stage Algorithm
Attack Resilience Analysis
Evaluation
Conclusions
Desired Requirements for Polymorphic Worm
Signature Generation[14]
• Network-based signature generation
– Worms spread in exponential speed, to detect them
in their early stage is very crucial
– At their early stage there are limited worm samples.
A host is unlikely to see the early worm packets.
– The high speed network router may see more worm
samples.
• Signature generation should be high speed to keep up
with the network speed!
4
Desired Requirements for Polymorphic Worm
Signature Generation[14]
• Noise tolerant
– Most network flow classifiers suffer false positives.
– Even host based approaches can be injected with noise.
• Attack resilience
– Attackers always try to evade the detection systems
• Efficient signature matching for high-speed links
5
Design Space and Related Work
Network Based
Exploit Based
Vulnerability
Based
[Polygraph-SSP05]
[Hamsa-SSP06]
[PADS-INFOCOM05]
[CFG-RAID05]
[Nemean-Security05]
LESG (this paper)
Host Based
[DOCODA-CCS05]
[TaintCheck-NDSS05]
[Vulsig-SSP06]
[Vigilante-SOSP05]
[COVERS-CCS05]
[ShieldGen-SSP07]
• Existing vulnerability-based signature generation schemes are
host-based and cannot work at the network router/gateway
level.
Signature Generation Classess
• Two Classes
– Vulnerability-based: inherent to the vulnerability
that the worm tries to exploit
• Unique, and hard to evade
– Exploit-based: capture certain characteristics of a
specific worm implementation
• Less acurate and can be evaded
Exploit-based Schemes
• Finds invariant substrings of exploit flow
– Polygraph[15], Hamsa[14]
• Finds symbolic similarity by using full-system
symbolic execution on every machine code
– DACODA[18]
• Finds structural similarities between different
worm binary codes
– CFG (Control Flow Graph) [24]
Vulnerability-based Schemes
• Uses the properties of vulnerable program
• A vulnerability signature matches all exploits
of a given vulnerability
Outline
•
•
•
•
•
•
•
Motivation and Related Work
Design of LESG
Problem Statement
Three Stage Algorithm
Attack Resilience Analysis
Evaluation
Conclusions
Basic Ideas
• At least 75%
vulnerabilities are due to
buffer overflow
• Intrinsic to buffer
overflow vulnerability
and hard to evade
• However, there could be
thousands of fields to
select the optimal field
set is hard
Overflow!
Protocol message
Vulnerable
buffer
Deployment of LESG
First, sniff traffic from networks
and classify the traffic as different
application level protocols.
Next, we filter out known worms and then further
separate the traffic into a suspicious traffic pool and a
normal traffic reservoir.
Framework
Network
Tap
TCP
25
Known
Worm
Filter
Worm
Flow
Classifier
Protocol
Classifier
TCP
53
TCP
80
. . .
Suspicious
Traffic Pool
TCP
137
UDP
1434
LESG
Signatures
Real time
Normal traffic
reservoir
Normal
Traffic Pool
Policy driven
13
LESG Signature Generator
14
Outline
•
•
•
•
•
•
•
Motivation and Related Work
Design of LESG
Problem Statement
Three Stage Algorithm
Attack Resilience Analysis
Evaluation
Conclusions
15
Field Hierarchies
DNS PDU
• Each of the application sessions
(flows) usually contains one or
more Protocol Data Units (PDUs)
• A PDU is a sequence of bytes
and can be dissected into
multiple fields.
Problem Formulation
Worms which are
not covered in the
suspicious pool are
at most 
Suspicious
pool
LESG
Signature
Normal
pool
With noise

Minimize the false
positives in the
normal pool
NP-Hard!
17
Outline
•
•
•
•
•
•
•
Motivation and Related Work
Design of LESG
Problem Statement
Three Stage Algorithm
Attack Resilience Analysis
Evaluation
Conclusions
18
Three Stages
• Step 1: Field Filtering
– Select possible signature field candidates.
• Step 2: Signature Length Optimization
– Optimize the signature lengths for each eld.
• Step 3: Signature Pruning
– Find the optimal subset of candidate signatures with low
false positives and false negatives.
Stages I and II
COV≥1%
FP≤0.1%
Stage I: Field Filtering
Trade off between specificity and sensitivity
Score function Score(COV,FP)
Stage II: Length Optimization
20
Stage I
Inputs:
FP0 - false positives
COV0 - detection coverage.
M – suspicious traffic pool
|M| - number of suspicious flows in M
N – normal traffic pool
|N| - number of normal flows in N
S – signature set
A signature is a pair Sj = (fj ; lj), where fj is the signature field ID,
and lj is the corresponding signature length for field fj .
The total running time
Since |M| is usually far smaller than |N|, the overall time cost is
Stage II
• Optimize the length value of each candidate
signature to nd the best tradeoff between the
coverage and false positives.
– If the length signature selected is too long, there will be
less coverage of malicious worm flows.
– If the length selected is too short, there will be a lot of false
positives.
• Aims to maximize
Stage II
Stage III
Find the optimal set of fields as the
signature with high coverage and low false
positive
Separate the fields to two sets, FP=0 and
FP>0
– Opportunistic step (FP=0)
– Attack Resilience step (FP>0)
24
Stage III
25
Attack Resilience Bounds
Depend on whether deliberated noise injection (DNI)
exists, we get different bounds.
With 50% noise in the suspicious pool, we can get the
worse case bound FN<2% and FP<1%
In practice, the DNI attack can only achieve FP<0.2%
Resilient to most proposed attacks (proposed in other
papers)
26
Outline
•
•
•
•
•
•
•
Motivation and Related Work
Design of LESG
Problem Statement
Three Stage Algorithm
Attack Resilience Analysis
Evaluation
Conclusions
27
Methodology
Protocol parsing with Bro and BINPAC
(IMC2006)
Worm workload
– Eight polymorphic worms created based on real world vulnerabilities
including CodeRed II and Lion worms.
– DNS, SNMP, FTP, SMTP
Normal traffic data
– 27GB from a university gateway and 123GB email log
28
Results
Single/Multiple worms with noise
– Noise ratio: 0~80%
– False negative: 0~1% (mostly 0)
– False positive: 0~0.01% (mostly 0)
Pool size requirement
– 10 or 20 flows are enough even with 20% noises
Speed results
– With 500 samples in suspicious pool and 320K samples in normal pool,
For DNS, parsing 58 secs, LESG 18 secs
29
The range of the signatures we generated and their accuracy.
Conclusions
• A novel network-based automated worm
signature generation approach
– Work for zero day polymorphic worms with
unknown vulnerabilities
– First work which is both Vulnerability based and
Network based using length signature for buffer
overflow vulnerabilities
– Provable attack resilience
– Fast and accurate through experiments
32
Download